End-to-End Encryption in the Snowflake Data Warehouse
Apr 13, 2016
Cloud Data Security, Engineering
By Martin Hentschel and Peter Povinec.
Protecting customer data is one of the highest priorities for Snowflake. The Snowflake data warehouse encrypts all customer data by default, using the latest security standards, at no additional cost. Snowflake provides best-in-class key management, which is entirely transparent to customers. This makes Snowflake one of the easiest to use and most secure data warehouses on the market.
In previous blog posts, we explained critical components of Snowflake’s security architecture, including:
- How Snowflake manages encryption keys; including the key hierarchy, automatic key rotation, and automatic re-encryption of data (“rekeying”)
- How Snowflake uses Amazon CloudHSM to store and use Snowflake’s master keys with the highest protection
- How Amazon CloudHSM is configured to run in high-availability mode.
In this blog post, we explain Snowflake’s ability to support end-to-end encryption, including:
- How customers can upload their files to Amazon S3 using client-side encryption
- How to use Snowflake to import and export client-side encrypted data.
End-to-end encryption is a form of communication where only the end users can read the data, but nobody else. For the Snowflake data warehouse service it means that only the customer and runtime components of the Snowflake service can read the data. No third parties, including Amazon AWS and any ISPs, can see data in the clear. This makes end-to-end encryption the most secure way to communicate with the Snowflake data warehouse service.
End-to-end encryption is important because it minimizes the attack surface. In the case of a security breach of any third party (for example of Amazon S3) the data is protected because it is always encrypted, regardless whether the breach is due to the exposure of access credentials indirectly or the exposure of data files directly, whether by an insider or by an external attacker, whether inadvertent or intentional. The encryption keys are only in custody of the customer and Snowflake. Nobody else can see the data in the clear – encryption works!
Figure 1 illustrates end-to-end encryption in the Snowflake data warehouse. There are three actors involved: the customer in its corporate network, a staging area, and the Snowflake data warehouse running in a secure virtual private cloud (VPC). Staging areas are either provided by the customer (option A) or by Snowflake (option B). Customer-provided staging areas are buckets or directories on Amazon S3 that the customer owns and manages. Snowflake-provided stages are built into Snowflake and are available to every customer in their account. In both cases, Snowflake supports end-to-end encryption.
The flow of end-to-end encryption in Snowflake is the following (illustrated in Figure 1):
- The customer uploads data to the staging area. If the customer uses their own staging area (option A), the customer may choose to encrypt the data files using client-side encryption. If the customer uses Snowflake’s staging area (option B), data files are automatically encrypted by default.
- The customer copies the data from the staging area into the Snowflake data warehouse. Within the Snowflake data warehouse, the data is transformed into Snowflake’s proprietary file format and stored on Amazon S3 (“data at rest”). In Snowflake, all data at rest is always encrypted.
- Results can be copied back into the staging area. Results are (optionally) encrypted using client-side encryption in the case of customer-provided staging areas or automatically encrypted in the case of Snowflake-provided staging areas.
- The customer downloads data from the staging area and decrypts the data on the client side.
In all of these steps, all data files are encrypted. Only the customer and runtime components of Snowflake can read the data. Snowflake’s runtime components decrypt the data in memory for query processing. No third-party service can see data in the clear.
Customer-provided staging areas are an attractive option for customers that already have data stored on Amazon S3, which they want to copy into Snowflake. If customers want extra security, they may use client-side encryption to protect their data. However, client-side encryption of customer-provided stages is optional.
Client-Side Encryption on Amazon S3
Client-side encryption, in general, is the most secure form of managing data on Amazon S3. With client-side encryption, the data is encrypted on the client before it is uploaded. That means, Amazon S3 only stores the encrypted version of the data and never sees data in the clear.
Client-side encryption follows a specific protocol defined by Amazon AWS. The AWS SDK and third-party tools such as s3cmd or S3 Browser implement this protocol. Amazon S3’s client-side encryption protocol works as follows (Figure 2):
- The customer creates a secret master key, which remains with the customer.
- Before uploading a file to Amazon S3, a random encryption key is created and used to encrypt the file. The random encryption key, in turn, is encrypted with the customer’s master key.
- Both the encrypted file and the encrypted random key are uploaded to Amazon S3. The encrypted random key is stored with the file’s metadata.
When downloading data, the encrypted file and the encrypted random key are both downloaded. First, the encrypted random key is decrypted using the customer’s master key. Second, the encrypted file is decrypted using the now decrypted random key. All encryption and decryption happens on the client side. Never does Amazon S3 or any other third party (for example an ISP) see the data in the clear. Customers may upload client-side encrypted data using any clients or tools that support client-side encryption (AWS SDK, s3cmd, etc.).
Ingesting Client-Side Encrypted Data into Snowflake
Snowflake supports reading and writing to staging areas using Amazon S3’s client-side encryption protocol. In particular, Snowflake supports client-side encryption using a client-side master key.
Ingesting client-side encrypted data from a customer-provided staging area into Snowflake (Figure 3) is just as easy as ingesting any other data into Snowflake. To ingest client-side encrypted data, the customer first creates a stage object with an additional master key parameter and then copies data from the stage into their database tables.
As an example, the following SQL snippet creates a stage object in Snowflake that supports client-side encryption:
-- create encrypted stage create stage encrypted_customer_stage url='s3://customer-bucket/data/' credentials=(AWS _KEY_ID='ABCDEFGH' AWS_SECRET_KEY='12345678') encryption=(MASTER_KEY='aBcDeFgHiJkL7890=');
The master key specified in this SQL command is the Base64-encoded string of the customer’s secret master key. As with all other credentials, this master key is transmitted to Snowflake over TLS (HTTPS) and stored in a secure, encrypted way in Snowflake’s metadata storage. Only the customer and query-processing components of Snowflake know the master key and are therefore able to decrypt data stored in the staging area.
As a side note, stage objects can be granted to other users within a Snowflake account without revealing S3 access credentials and client-side encryption keys to those users. This makes a stage object an interesting security feature in itself that is a Snowflake advantage over alternatives.
After the customer creates the stage object in Snowflake, the customer may copy data into their database tables. As an example, the following SQL command creates a database table “users” in Snowflake and copies data from the encrypted stage into the “users” table:
-- create table and ingest data from stage create table users (id bigint, name varchar(500), purchases int); copy into table from @encrypted_customer_stage/users;
The data is now ready to be analyzed using the Snowflake data warehouse. Of course, data can be offloaded into the staging area as well. As a last example, the following SQL command first creates a table “most_purchases” as the result of a query that finds the top 10 users with the most purchases, and then offloads the table into the staging area:
-- find top 10 users by purchases, unload into stage create table most_purchases as select * from users order by purchases desc limit 10; copy into @encrypted_customer_stage/most_purchases from most_purchases;
Snowflake encrypts the data files copied into the customer’s staging area using the master key stored in the stage object. Of course, Snowflake adheres to the client-side encryption protocol of Amazon S3. Therefore, the customer may download the encrypted data files using any clients or tools that support client-side encryption.
All customer data in the Snowflake warehouse is encrypted at transit and at rest. By supporting encryption for all types of staging areas as well, whether customer-owned staging areas or Snowflake-owned staging areas, Snowflake supports full end-to-end encryption in all cases. With end-to-end encryption, only the customer and runtime components of the Snowflake service can read the data. No third parties in the middle, for example Amazon AWS or any ISPs, see data in the clear. Therefore, end-to-end encryption secures data communicated with the Snowflake data warehouse service.
Protecting customer data at all levels is one of the pillars of the Snowflake data warehouse service. As such, Snowflake takes great care in securing the import and export of data into the Snowflake data warehouse. When importing and exporting data in Snowflake, customers may choose to use customer-provided staging areas or Snowflake-provided staging areas. Using customer-provided staging areas, customers may choose to upload data files using client-side encryption. Using Snowflake-provided staging, data files are always encrypted by default. Thus, Snowflake supports the end-to-end encryption in both cases.