An Architecture Built for the Cloud
Snowflake’s unique architecture empowers data analysts, data engineers, data scientists and data application developers to work on any data without the performance, concurrency or scale limitations of other solutions. Snowflake is a single, near-zero maintenance platform delivered as-a-service. It features compute, storage, and cloud services layers that are logically integrated but scale independent from one another, making it an ideal platform for many workloads.
Secure and governed by design, and compatible with popular ETL, BI, and data science tools, Snowflake enables data professionals to support many data warehouse, data lake, data engineering and data science workloads with virtually unlimited concurrency. Snowflake is also a powerful query processing back-end platform for developers creating modern data-driven applications.
Metadata is also automatic. What’s more, metadata processing within Snowflake is automatic and does not compete with the compute resources running your queries. This means Snowflake can scale near-infinitely as your compute resources scale out.
Beyond these attributes, what makes Snowflake different?
In One Word, Architecture
Snowflake is built on a patented, multi-cluster, shared data architecture created for the cloud to revolutionize data warehousing, data lakes, data analytics and a host of other use cases.
One Integrated Platform, Infinite Workloads, Any Cloud
Snowflake is a single platform comprised of storage, compute, and services layers that are logically integrated but scale infinitely and independent from one another.
Built on scalable cloud blob storage, the storage layer holds all the diverse data, tables and query results for Snowflake. Maximum scalability, elasticity, and performance capacity for data warehousing and analytics are assured since the storage layer is engineered to scale completely independent of compute resources. As a result, Snowflake delivers unique capabilities such as the ability to process data loading or unloading, without impacting running queries and other workloads.
Under the covers of the storage layer, Snowflake utilizes micro-partitions to securely and efficiently store customer data. When loaded into Snowflake, data is automatically split into modest-sized micro-partitions, and metadata is extracted to enable efficient query processing. The micro-partitions are then columnar compressed and fully encrypted, using a secure key hierarchy.
SNOWFLAKE ARCHITECTURE BENEFITS
Multi-cluster, Shared Data
Snowflake’s multi-cluster, shared data architecture is designed to process enormous quantities of data with maximum speed and efficiency. All data processing horsepower within Snowflake is performed by one or more clusters of compute resources. When performing a query, these clusters retrieve the minimum data required from the storage layer to satisfy queries. As data is retrieved, it’s cached locally with computing resources, along with the caching of query results, to improve the performance of future queries.
In addition, and unique to Snowflake, multiple compute clusters can simultaneously operate on the same data while fully enforcing global, system-wide transactional integrity with full ACID compliance. Operations always see a consistent view of the data, and write operations never block readers. Transactional integrity across compute clusters is achieved by maintaining all transaction states within the metadata services layer.
Snowflake utilizes micro-partitions to securely and efficiently store customer data. When loaded into Snowflake, data is automatically split into modest-sized micro-partitions, and metadata is extracted to enable efficient query processing. The micro-partitions are then columnar compressed and fully encrypted using a secure-key hierarchy.
Built on scalable cloud blob storage, the storage layer holds all the diverse data, tables, and query results for Snowflake. Maximum scalability, elasticity, and performance capacity for data and analytics are assured since the storage layer is engineered to scale completely independent of compute resources. As a result, Snowflake delivers unique capabilities such as the ability to process data loading or unloading, without impacting running queries and other workloads.
As a Service
Snowflake eliminates the administration and management demands of traditional data platforms. Snowflake is a true data platform-as-a-service running in the cloud. With built-in performance, there’s no infrastructure to manage or knobs to turn. Snowflake automatically handles infrastructure, optimization, availability, data protection, and more, so you can focus on using your data, not managing it.
Per-second, usage-based pricing for compute and storage means you only pay for the amount of data you store and the amount of compute processing you use. Say goodbye to upfront costs, over-provisioned systems or idle clusters consuming money.
Data Warehouse Built for Any Cloud
Separation of services from storage and compute allows multiple, compute clusters to simultaneously operate on the same data. Concurrency is virtually unlimited and can instantly scale with a multi-cluster warehouse. Full ACID transactional integrity is maintained across separate compute clusters. Queries always see a consistent view of data, while transaction commits are immediately visible to new workloads running on the platform. Activity in one computer cluster has zero impact on all other compute clusters. For example, data science in a compute cluster does not impact performance on queries running in other compute clusters, even when they are accessing the same data.
Time travel enables any select statement or zero-copy clone to view the database in a retained, consistent, “as of” state up to 90 days in the past. Default is 24 hours retention. Zero-copy clones of terabyte databases or tables happen in a matter of seconds and without incurring the extra storage cost. A clone is a fully logical replica of the original object but with an independent lifecycle.
Performance and Throughput
Snowflake outperforms traditional methods for executing data workloads. Compute resources scale linearly in Snowflake, while efficient query optimization delivers answers in a fraction of the time of legacy cloud or on-premises systems. Performance challenges can be addressed in seconds. You can specify the size of a compute cluster based on the performance you initially require. But you can resize at any time and even while a workload is running.
Multi-cluster warehouses deliver a consistent SLA to an unlimited number of concurrent users. Automatic clustering eliminates manual re-clustering of data when loading new data into a table. With materialized views, users experience improved query performance of workloads composed of common, repeated query patterns. As concurrent workloads increase, Snowflake automatically adds to compute clusters and distributes queries across them, removing the hassle of manually re-clustering data. Clusters pause when the workload decreases. Charges only accrue for active clusters, so you only pay for what you use and by the second. Plus, you can pause compute clusters at any time.
Storage and Support for All Data
- Storage is inexpensive and can scale almost infinitely. Snowflake is the optimal platform for warehousing data, delivering cost-effective and highly performant support for multi-petabyte databases. All storage costs are based on actual usage for compressed data and measured in TB stored per month.
- You can query both structured and machine-generated, semi-structured data (i.e., JSON, Avro, XML, Parquet) using relational SQL operators with similar performance characteristics as if querying structured data. Loading semi-structured data is painless. Schemas are dynamic and are automatically discovered during load. This support for dynamic schemas enables efficient query execution using natural extensions to SQL.
- With Snowflake, there’s no need to implement separate systems to process structured and semi-structured data. You can eliminate complex Hadoop and data warehouse pipelines. Snowflake can perform both roles much more efficiently and with better business results at a lower cost.
Availability and Security
Achieve high availability with Snowflake’s scale-out architecture, which is fully distributed across multiple Amazon, Azure and Google availability zones. Snowflake can continue operations and withstand the loss of availability due to hardware failure. The system is designed to tolerate failures with minimal impact to our customers.
Snowflake is secure by design. All data is encrypted in motion, over the Internet or direct links, and at rest on disks. Snowflake supports two-factor and federation authentication with single sign-on. Authorization is role-based. You can enable policies to limit access to predefined client addresses. Snowflake is SOC 2 Type 2 certified on both AWS and Azure and support for PHI data for HIPAA customers is available with a Business Associate Agreement. Additional levels of security, such as encryption across all network communications and virtual private or dedicated isolation, are also available.
Sharing and Collaboration Across all Data
Snowflake’s Secure Data Sharing enables you to share the data within your account with other Snowflake users but without having to copy or transfer data from the data provider’s account to the data consumer’s account. Instead, you grant secure and curated access to read-only copies of your data. Accounts that receive shared data only pay for the compute resources they use to consume the data. Data shared from a Snowflake data provider can easily be combined with data from the Snowflake data consumer’s account without laborious effort or third-party tools. Avoid the burdens and complexities of decades-old email, FTP and EDI technologies with Snowflake. Simply decide what you want to share with your data consumers, and share the data through easy-to-use SQL functions.
We chose Snowflake to help power the web performance reporting and analytics that we provide our customers because it differentiated itself from the alternatives.
– Matt Solnit, Founder and VP of Engineering, Soasta
The whole company rests on top of Snowflake – all of our analytics. It’s the base for our entire strategy.
– Michael Bigby, CTO, Research Now
Inside the Data Platform Built for the Cloud
Snowflake's architecture departs from the limitations of traditional shared-disk and shared-nothing architectures. Read the solution brief to learn more.