Elasticity & Separation of Compute and Storage
Author: Saqib Mustafa
Engineering, Snowflake Technology
Hopefully you had a chance to read our previous top 10 posts. As promised, we continue the series with a deeper dive into another of the Top 10 Cool Features from Snowflake:
#4 Elasticity & Separation of Storage and Compute
The innovative, patent-pending, Multi-Cluster, Shared Data Architecture in Snowflake takes full advantage of the elasticity of the cloud. With the advantage of building a data warehouse from the ground up, Snowflake has built an environment that really allows the customer to scale their data warehouse as they want. This unique environment gives the users the ability to add more compute power on demand, even in the middle of a process.
How it works
- Data is stored, once, on elastic disk storage in the cloud (for example it is currently stored AWS S3)
- Compute clusters (for example using AWS EC2 nodes) are allocated dynamically as required to do work (queries or loads)
- The compute clusters are connected to the data storage. The access to the data is based on the approved security credentials of the user that is running the query
The important point is that storage and warehouse are not tightly coupled together. This means you can grow and reduce both independently of each other.
More Power at the Push of a Button
Say you are running a series of complex SQL queries on a Small (2 node) warehouse in Snowflake. Normally you have until the close of business to run the reports and prepare the data. But today, at 8 AM, your management requests the results of that analysis for a board meeting at noon. So now you have half the time to get the reports to run. Yikes!
In a traditional data warehouse you would pretty much have only one option: kick everyone off the system and hope things run faster.
If however, you are running the reports on the Snowflake Elastic Data Warehouse, all you have to do is increase the size of the warehouse to a Medium which doubles the horsepower of the data warehouse (i.e. it doubles the number of nodes). And you can perform this change with a few clicks in our easy to use web-based interface, even if the process has already started.
Once you bump up the size of the warehouse, the next statement that starts will immediately use the additional resources. Notice that the first statement running with the Medium warehouse runs in about half the time of the prior statement, with about double the data!
You don’t need to interrupt the process, start over, or restart the service. It just takes effect automatically, at the push of a button. And once the process is done, you can reduce the warehouse back to a Small or even just turn it off.
And in this way, the unique cloud-based architecture of Snowflake allows you to truly take advantage of the elasticity of the cloud for your data warehouse.
In many other architectures, cloud and on-premises appliances, you have a certain amount of disk space associated with the compute nodes. If you need more storage; you must add more nodes. If you need more compute power, likewise you add more nodes. Plus other architectures may not allow you to scale down. Also in most of these cases, when adding more nodes, you must also shut down the environment or make it read only for a time while you redistribute the data manually across the new nodes.
Thus in the legacy data warehouses, elasticity is limited. And this forces you to plan for, acquire and manage the resources for your expected, future peak capacity and peak data demand. Otherwise you risk having your queries or your hardware fail (because you run out of disk capacity or CPU power, or both). In addition, trying to scale beyond existing capacity may lead to disruptions to the service (as you rebuild/extend the hardware), or some workloads being rejected in favor of others. And if you over-provision, and buy too much capacity, there may not be a way to scale back after the fact.
The Snowflake Advantage
For Snowflake, this is not an issue. You can grow and shrink the environment dynamically. The data storage grows and shrinks as you add or remove data, while the compute nodes can be ramped up or down, or turned off, as you require. You are not forced to pay for capacity up front, or kick other workloads off, or plan downtimes when ramping up your data warehouse capacity. That is the promise of cloud. This is why Snowflake is the real elastic Data Warehouse as a Service.
Very cool indeed.
As always, keep an eye on this blog site, our Snowflake Twitter feed (@SnowflakeDB), (@kentgraziano), and (@cloudsommelier) for more Top 10 Cool Things About Snowflake and for updates on all the action and activities here at Snowflake Computing.
Kent Graziano and Saqib Mustafa