Cloud-Built Data Warehouse Architecture
The Snowflake Architectural Difference
Snowflake’s unique data warehouse architecture provides complete relational database support for both structured data, such as CSV files and tables, and semi-structured data, including JSON, Avro, Parquet, etc., all within a single, logically integrated solution. Snowflake is a data warehouse-as-a-service, which requires no management and features separate compute, storage, and cloud services that can scale and change independently.
Metadata is also automatic. What’s more, metadata processing within Snowflake does not compete with the compute resources running your queries. This means Snowflake can scale near-linearly as your compute resources scale out.
Secure by design and compatible with popular ETL and BI tools, Snowflake enables data warehouse managers to support enterprise-wide data warehouse requirements with virtually unlimited concurrency. Snowflake is also a powerful query processing back-end platform for developers creating
modern data-driven applications.
Beyond these attributes, what makes Snowflake different?
In One Word, Architecture.
Snowflake is built on a patented, multi-cluster, shared data architecture born and created for the cloud to revolutionize data analytics and data warehousing.
Snowflake is a single integrated system with fully independent scaling for compute, storage and services.
Unlike shared-storage architectures that tie storage and compute together, Snowflake enables automatic scaling of storage, analytics, or workgroup resources
for any job, instantly and easily.
Built on scalable cloud blob storage, the storage layer holds all the diverse data, tables and query results for Snowflake. Maximum scalability, elasticity, and performance capacity for data warehousing and analytics are assured since the storage layer is engineered to scale completely independent of compute resources. As a result, Snowflake delivers unique capabilities such as the ability to process data loading or unloading, without impacting running queries and other workloads.
Under the covers of the storage layer, Snowflake utilizes micro-partitions to securely and efficiently store customer data. When loaded into Snowflake, data is automatically split into modest-sized micro-partitions, and metadata is extracted to enable efficient query processing. The micro-partitions are then columnar compressed and fully encrypted using a secure key hierarchy.
Snowflake Data Warehouse Architecture Benefits
Transactional SQL Data Warehouse
- Separation of services from storage and compute allows multiple, virtual warehouses (compute clusters) to simultaneously operate on the same data. Concurrency is virtually unlimited and can be instantly scaled with a multi-cluster warehouse.
- Activity in one virtual warehouse has zero impact on all other virtual warehouses. For example, data loading in a virtual warehouse does not impact performance on queries running in other virtual warehouses, even when they are accessing the same data.
- Full ACID transactional integrity is maintained across separate, virtual warehouses. Queries always see a consistent view of data, while transaction commits are immediately visible to new queries running in all data warehouses.
- Zero-copy clones of terabyte databases or tables happen in a matter of seconds and without incurring the extra storage cost. A clone is a full logical replica of the original object, but with an independent lifecycle.
- Time travel enables any select statement or zero-copy clone to view the database in a retained, consistent, “as of” state up to 90 days in the past. Default is 24 hours retention.
Performance and Throughput
- Snowflake outperforms traditional methods for data analytics. Compute resources scale linearly, while efficient query optimization delivers answers in a fraction of the time of legacy cloud or on-premise systems.
- Performance challenges can be addressed instantly, in seconds. You can specify the size of a virtual warehouse based on the performance you initially require. But you can resize at any time and even while a warehouse is running.
- You only pay, by the second, for the compute resources you use. Plus, you can pause virtual warehouses at any time.
- Multi-cluster warehouses deliver a consistent SLA to an unlimited number of concurrent users. As the concurrent workload increases, Snowflake can automatically add clusters to virtual warehouses and automatically distribute queries across those clusters. When the workload decreases, the clusters are paused. Charges only accrue for active clusters, so you only pay for what you use.
Storage and Support for All Data
- Storage is inexpensive and can scale virtually indefinitely. Snowflake is the optimal platform for warehousing data, delivering cost-effective and highly performant support for multi-petabyte databases. All storage costs are based on actual usage for compressed data and measured in TB stored per month.
- You can query both structured and machine-generated, semi-structured data (i.e., JSON, Avro, XML, Parquet) using relational SQL operators with similar performance characteristics as if querying structured data. Loading semi-structured data is painless. Schemas are dynamic and are automatically discovered during load. This support for dynamic schemas enables efficient query execution using natural extensions to SQL.
- With Snowflake, there’s no need to implement separate systems to process structured and semi-structured data. You can eliminate complex Hadoop and data warehouse pipelines. Snowflake can perform both roles much more efficiently and with better business results at a lower cost.
Availability and Security
- Achieve high availability with Snowflake’s scale-out architecture, which is fully distributed across multiple Amazon and Azure availability zones. Snowflake can continue operations and withstand the loss of availability due to hardware failure. The system is designed to tolerate failures with minimal impact to our customers.
- Snowflake is secure by design. All data is encrypted in motion, over the Internet or direct links, and at rest on disks. Snowflake supports two-factor and federation authentication with single sign-on. Authorization is role-based. You can enable policies to limit access to predefined client addresses.
- Snowflake is SOC 2 Type 2 certified on both AWS and Azure and support for PHI data for HIPAA customers is available with a Business Associate Agreement. Additional levels of security, such as encryption across all network communications and virtual private or dedicated isolation, are also available.
Seamless Data Sharing
- Snowflake’s unique, built-for-the-cloud architecture enables you to share the data within your account with other Snowflake users.
- Snowflake Data Sharing doesn’t require copying or transferring data from the provider’s account to the consumer’s account. Instead, you grant secure and curated access to read-only copies of your data.
- Accounts that receive shared data only pay for the compute resources they use to consume the data. Data shared from a Snowflake data provider can easily be combined with data from the Snowflake data consumer’s account without laborious effort or third-party tools.
- Avoid the burdens and complexities of decades old email, FTP and EDI technologies. Simply decide what you want to share with your data consumers, and share the data through easy-to-use SQL functions.
We chose Snowflake to help power the web performance reporting and analytics that we provide our customers because it differentiated itself from the alternatives.
– Matt Solnit, Founder and VP of Engineering, Soasta
The whole company rests on top of Snowflake – all of our analytics. It’s the base for our entire strategy.
– Michael Bigby, CTO, Research Now
Inside the Data Warehouse Built for the Cloud
Snowflake's architecture departs from the limitations of traditional shared-disk and shared-nothing architectures. Learn more in our SIGMOD whitepaper.