How to Make Data Protection and High Availability for Analytics Fast and Easy
August 27, 2018
Author: Steven Pelley | Contributing Authors: Michael Nixon
Engineering, Snowflake Technology
When moving enterprise data warehouse analytic workloads to the cloud, it’s important to consider data protection and high availability (HA) services that will keep your valuable data preserved and your analytics running. Possible events such as human error, infrastructure failure, an unfortunate act of nature or any activity that places your data at risk can’t be ignored.
All the while, data protection and HA should be fast and easy. It shouldn’t take days, weeks or months, nor an army of technical specialists or a big budget, to make these critical safety measures happen.
What data-driven companies depend on
With real-time dashboards, organizations depend on data analytics to communicate the status of business operations. Increasingly, companies embed self-service analytics into customer-facing applications. Therefore, enterprises spend enormous amounts of effort, energy and resources to gather and cultivate data about their customers. With all this activity around data, loss of any data or processing capability could result in catastrophic consequences for an organization.
How Snowflake protects your data and services
For these reasons, Snowflake innovates and integrates data protection and HA as core capabilities of our cloud-built data warehouse-as-a-service. In Figure 1, you’ll see that what makes Snowflake different. Our protection capabilities are all built-in and orchestrated with metadata across your entire service. The figure also illustrates, how Snowflake resilience is automatically distributed across three availability zones.
- Built-in data protection: Over and above standard cloud data protection, Snowflake Time Travel enables you to recover data from any point, up to 90 days. In addition, it’s all accomplished automatically. Other than specifying the number of days for Time Travel retention at setup (default is 24 hours for Snowflake Enterprise and above), you do not have to initiate a thing or manage snapshots.
This brings significant advantages for business analysts performing scenario-based analytics on changed datasets, or for data scientists who want to train new models and algorithms on old data.
- Built-in service protection against node failures: The impact of node failures can be tricky to figure out with different cloud implementations offered by different cloud data warehouse vendors. While other cloud data warehouse or querying services may provide some level of redundancy for current data, mechanisms to protect against data corruption or data loss in the event of a node failure vary.
In most cases, the burden is on you to create a cluster (i.e., a system with a node count of greater than one) to protect against node failures. Typically, this means added cost (hardware, storage, and software instances), as well as added complexity, to account for the additional nodes. Some competing services may have a performance penalty on data writes. This exists because, under the covers, redundant nodes are being written using compute resources. We see this most frequently with on-premises data warehouse environments retrofitted for the cloud. Moreover, there also could be hidden costs in the form of your cluster going down and not being accessible for queries or updates during the time a failed node is being reconstructed.
Because the Snowflake architecture separates the compute, storage and service layers, Snowflake assures resiliency and data consistency in the event of node failures. Depending on the severity of failures, Snowflake may automatically reissue (retry) without a users’ involvement. And there is also no impact on write (or read) performance. In addition, you can take advantage of lower cost storage. Competing services may highly encourage or restrict you to use premium-cost storage.
- Built-in high availability: Providing an even higher degree of data protection and service resilience, within the same deployment region, Snowflake provides standard failover protection across three availability zones (including the primary active zone). Your data and business are protected. As you ingest your data, it is synchronously and transparently replicated across availability zones. This protection is automatically extended from Snowflake to customers, at no added charge.
Further, all the metadata, the magic of Snowflake services, is also protected.
Bottom line, within the same deployment region, you do not have to configure or struggle with manually building an HA infrastructure. Our data warehouse-as-a-service takes care of this for you, automatically. Snowflake makes data protection and high availability fast and easy. You can mitigate risks with speed, cost-effectiveness, confidence and peace of mind.