Product and Technology

How We Built Snowflake on Azure

How We Built Snowflake on Azure

Today, we announced the general availability of Snowflake on Azure. As a part of the engineering team that built Snowflake on Azure, I’m especially excited to unveil what we’ve been working on.

Snowflake on Azure has been an ambitious project. We wanted to offer the same Snowflake service customers already use on other cloud platforms but built for Azure, including all existing and new features, with a single code base, and with the same performance characteristics. In this post, I’ll tell you a little more about how we did it, and some of the Azure features and strengths we built upon.

Leveraging Azure’s Strengths

Enabling Snowflake to run on Azure included three big categories: building on top of Azure Blob Storage for all internal and customer-facing persistent storage, using Azure Compute to run workloads, and securing all access using Azure Active Directory and security features built into Azure components.

Storage

Azure blob storage has a two-level hierarchy. Storage accounts hold containers, and containers have a classic folder hierarchy within them. Containers can be independently secured with Shared Access Signature (SAS) tokens, which are time- and permission-scoped credentials. When accessing customer data, Snowflake uses SAS tokens scoped only to that customer’s container, which allows Snowflake to ensure that data in one customer’s container is never accessible when running within the context of another customer.

Snowflake uses soft delete for Azure storage blobs to protect data from corruption and accidental deletion, and to recover data in case of a catastrophic event. Built in coordination with our team, soft delete allows us to offer data resiliency without building our own snapshotting feature.

Snowflake’s workload tends to have high storage usage, and high scale storage accounts give Snowflake increased capacity and higher ingress and egress limits on our storage accounts. Accelerated networking also gives Snowflake a boost in networking performance, which is important for communications between machines running a query and for reads and writes to storage. These two features were critical for reaching our performance goals.

Compute

When you run your workload in Snowflake, the machines used to run your queries are dedicated to your exclusive use. To add computing power when you want it, Snowflake elastically allocates machines for your workload using Azure Resource Manager templates. Azure Compute allows us to create, manage, and deallocate those resources while ensuring a single fault in a data center, or a forced system update, doesn’t impact your ability to run your queries.

Security

We take security very seriously at Snowflake, so building an airtight security model was crucial. Our security model is built upon the native security concepts within Azure. We use Azure Active Directory to manage identities and provide continuous security logging and monitoring. As I mentioned above, Snowflake makes use of SAS tokens to ensure one customer’s data is never accessible while running another customer’s query, even to internal Snowflake processes. But SAS tokens allow us to secure more than just internal data within a storage container. Snowflake also dynamically creates short-term, expirable tokens that our Snowflake drivers use to retrieve results files, or put and retrieve data in storage areas, such as table, user, and named stages. These tokens ensure that connections requesting data are secured using TLS and originate from Snowflake’s own IP addresses. SAS tokens can be used only for the specific operations and files a customer needs, following principles of least privilege. Finally, all SAS tokens we create expire after a limited time.

Snowflake encrypts all data at all times. Data on Snowflake’s storage accounts is encrypted at rest using Azure Storage Encryption. In addition, we store data using an additional layer of encryption with AES-256 data and key encryption using Snowflake-managed keys. Like SAS tokens, the encryption keys for a customer’s account are retrievable only when running queries for that account, ensuring one customer can’t decrypt another customer’s data.

External stages allow customers to import and export data they manage on their own Azure storage accounts. Because SAS tokens offer fine-grained, scoped, expiry-based control, we use customer-created SAS tokens to access data within Azure external stages. In addition to requiring usage of scoped tokens, we strongly encourage customers to encrypt all files on their external stages using client-side encryption and a 256-bit encryption key. Snowflake then decrypts that data on load and encrypts on unload using your keys and the same AES data encryption and AES-KW key encryption supported by the Azure Storage SDK.

How We Build It

Supporting multiple cloud platforms can add a significant tax to your engineering and operations teams. From the beginning, we decided we needed to build support for Azure using the same code base we use for other cloud platforms. We use layers of abstraction to encapsulate interactions with cloud-specific storage, compute and security APIs. We have a single build process and a single set of binaries that we deploy to all Snowflake regions, no matter what cloud platform they run on. As Snowflake expands to more regions, this keeps our engineering, release and maintenance processes scalable. It also means we can deploy a new release within hours in every region and monitor all regions and cloud platforms using a single set of tools.

Partnering with Microsoft

We couldn’t have built this without close coordination with Microsoft. The Snowflake on Azure project included many long phone sessions and meetings in Redmond conference rooms. Several new features were built with Snowflake in mind, including Azure storage soft delete and improvements to virtual machine provisioning. A big thank you goes out to the Azure team for helping us deliver together.

What’s Next?

We’re far from done. Every new feature for Snowflake is now built in tandem for each cloud platform we support. And, as new features are added to Azure, we will leverage them in Snowflake for Azure. Today, we call East US 2 home. In the coming months, we will branch out to new Azure regions to reach more customers. But first, maybe a short vacation for the Snowflake on Azure team.

Share Article

New Snowflake Features Released in April 2023

Snowflake released exciting features including general availability of Account Replication and the Snowflake Connector for Django on Snowflake Labs. Read more.

Snowpark is Now Generally Available - Snowflake Blog

Today we’re excited to announce the official General Availability launch of Snowpark, the developer framework that opens data programmability to all users.

Stream Rows and Kafka Topics Directly into Snowflake with Snowpipe Streaming

Today we are happy to announce the public preview of Snowpipe Streaming as the latest addition to our Snowflake ingestion offerings. Read more.

Snowflake Introduces JavaScript Stored Procedures

We are excited to announce the General Availability of Snowflake’s new JavaScript-based stored procedure language, which is production-ready and available in our standard edition.

Snowflake Support 2022 NPS Survey Results

Snowflake announced results from the 2022 Customer Experience Survey. The Snowflake Support team reviewed the results and we are especially proud this year. Read more.

Global Snowflake Azure Blob | Snowflake Data Warehousing Blog

Explore additional integration capabilities with Microsoft Azure Blob storage. This is part of Snowflake’s commitment to support global customers.

How Snowflake Accelerates Point Lookups and Analytical Queries

To speed up lookups and searches, Snowflake introduced the Search Optimization Service about a year ago, and today we are announcing improvements in performance, support of new data types, and cost reductions. Read more.

Snowflake Copilot Now GA: A Breakthrough AI-Powered SQL Assistant

Announcing the general availability of Snowflake Copilot, an AI-powered SQL assistant that simplifies data analysis and maintains robust governance.

Snowflake Completes CCCS Protected B Assessment on AWS and Azure

Snowflake has completed the Canadian Centre for Cyber Security (CCCS) Protected B Assessment, empowering the Canadian government to securely use sensitive data for critical initiatives on AWS and Azure.

Subscribe to our blog newsletter

Get the best, coolest and latest delivered to your inbox each week

Where Data Does More

  • 30-day free trial
  • No credit card required
  • Cancel anytime