How Snowflake Delivers a Single Data Experience Across Multiple Clouds and Regions
Jun 02, 2020 | 4 Min Read
Author: Benoit Dageville
When Thierry Cruanes and I co-founded Snowflake in the summer of 2012, we decided Snowflake Cloud Data Platform would be built on top of the infrastructure provided by the public clouds. From day one, our ambition was to build a cloud data platform with virtually unlimited elasticity, scale, and performance, and which was easy to use and had a pay-as-you-go model.
The building block of our cloud data platform is a Snowflake region. There is a one-to-one relationship between a Snowflake public region and a cloud public region. The architecture of a Snowflake region is designed to take advantage of the underlying infrastructure provided by our cloud data platform. It is highly available, scalable, and self-managed. At a high-level, it is composed of three fully decoupled tiers, scaling independent of one another.
At the center of a Snowflake region is the storage tier, which leverages the blob storage of the hosting cloud region – very cheap, has virtually unlimited capacity, and is highly available and durable. It supports multi-petabyte sized tables with native support for semi-structured data, such a JSON. Our storage layer enables sub-second response without explicitly partitioning the data. We also fully support ACID transactions with very fast insert/update/delete/merge operations. And finally, our storage is highly secure. All data at rest is encrypted by Snowflake.
The second tier of our architecture is the multi-cluster compute layer, which is fully decoupled from the storage layer. This layer runs any number of workloads, which have their own dedicated compute clusters, and even if these workloads read or write the same underlying data. This is a very cost-effective model because compute resources are independently sized based on the need of each workload and Snowflake charges by the second. This layer also provides unlimited scalability and elasticity, since there is virtually no limit on the number of active, concurrent clusters.
The last tier of the architecture runs the cloud services for a Snowflake region. It’s really the control plane of a Snowflake region and the interface to all external clients. For each customer account, the cloud services tier manages client sessions, metadata, transactions, query planning, security/governance, and many other services. This tier is also highly scalable, without any scale limitation. Today, large Snowflake regions support thousands of customer accounts and handle hundreds of millions of queries per day.
Initially, Snowflake was only available on AWS. We later ported it to Azure cloud in 2018 and then to Google Cloud Platform (GCP) in 2020. Soon, Snowflake Cloud Data Platform will be available in more cloud regions than any single cloud infrastructure provider offers.
We ported our software by building a cloud-agnostic layer, abstracting the specificity of the underlying cloud infrastructure. This means any application running on Snowflake Cloud Data Platform is also cloud agnostic. This aspect is very important since avoiding any cloud lock-in is one of the key benefits of using Snowflake
However, our vision for Snowflake was much larger than just running our software independently in different cloud regions. We envisioned the ability for Snowflake regions to interact with each other to deliver global capabilities. Without these global features, our customers’ data would be siloed similar to how it is siloed in the infrastructure cloud, effectively confining applications to a single cloud region.
So, we also built what we call the global data mesh to link all of these regions with each other, no matter their origin. The global data mesh is used by our platform to efficiently and securely move very large quantities of data. This now enables us to build global features, which are by nature cross-region and cross-cloud.
The first of these global features is global account management, which makes it easy to create and manage Snowflake accounts in individual regions as one. A new type of admin role has been created for this – the org admin. For example, the org admin can simply go to the snowflake.com website and create their first account in the Snowflake AWS US West region. From this first account, the org admin can create new accounts in any other region of Snowflake Cloud Data Platform, using the “create account” DDL statement.
Another global feature of Snowflake is database replication. You can replicate a database between any account belonging to the same organization, even if these accounts reside in different clouds. Database replication has several use cases. The first is global data sharing, which enables sharing data between two Snowflake regions and replicating the data to the new region. The other use case is migrating an account to a different Snowflake region. The last use case is for business continuity. In that case, Snowflake supports full failover of both data and client connections.
In addition, we’ve recently introduced a new and exciting global solution, Snowflake Data Marketplace, formerly named as the public data exchange. Our data marketplace is available to all of our customers and is the place where data providers create listings to advertise their data sets to data consumers. Snowflake provider and consumer accounts can be located in different regions or clouds. Hence, the marketplace was made to be global. All interactions between consumers and providers are performed through Snowflake Cloud Data Platform, again, using the global data mesh.
These features illustrate how Snowflake Cloud Data Platform is effectively one single platform for the world, built on top of the infrastructure cloud where each node is a Snowflake region connected to all others via our global data mesh.
This makes Snowflake one cohesive system transcending both cloud and geographic boundaries, and enabling us, and our customers, to build amazing global and cloud-agnostic features. And of course, as you can imagine, we are only at the beginning of our global journey.
Click here to hear more about the rise of the Data Cloud from Snowflake CEO Frank Slootman. Click here to learn about the latest innovations to Snowflake Cloud Data Platform from Snowflake’s Senior Vice President of Product, Christian Kleinerman.