Global Snowflake: Loading Data into Snowflake from Azure Blob Storage
2月 02, 2018
Author: Torsten Grabs
Engineering, How to Use Snowflake, Snowflake Ecosystem
Snowflake is available in a number of AWS regions as we continue to expand our geographic presence to meet customer demand, reduce latency and satisfy industry compliance and country-specific regulations.
As the global adoption of Snowflake continues, we also expect customers to transfer increasingly higher data volumes between regions. There are several potential scenarios that, taken together, are driving this trend, including:
- Data sharing across regions. A data set maintained by a Snowflake account in a specific region can be easily shared and consumed by others in different regions.
- Synchronizing Snowflake data between different geographic regions for disaster recovery purposes.
- Importing and exporting raw data from storage in other cloud providers. A common request from Snowflake customers running Snowflake on AWS was to support import from and export to Blob Storage in Microsoft Azure.
Therefore, we are happy to announce additional integration capabilities with Blob Storage in Microsoft Azure. This is part of Snowflake’s commitment to becoming a global cloud provider, supporting customers no matter which cloud provider(s) they choose. The integration with Azure Blob Storage complements Snowflake’s existing functionality for data loading and unloading. From now on, the Snowflake SQL commands to define an external stage support the URL and credential specifications for Azure Blob Storage. This allows customers to import data from and export data to Azure Blob Storage containers.
This targets architectures with data sets in Azure. Customers using Microsoft Azure can now easily exchange data with Snowflake to benefit from Snowflake’s leading analytics capabilities. The following picture illustrates the overall approach:
The key integration point between Azure and Snowflake is a container in Azure Blob Storage. Snowflake expects that any data to be loaded from the Azure application into Snowflake is placed in a container. This container is then registered as an external stage in Snowflake using the following Snowflake SQL command:
CREATE STAGE azstage URL = azure://<account>.blob.core.windows.net/<container>/<path> CREDENTIALS=(AZURE_SAS_TOKEN=…)
The URL parameter of the CREATE STAGE command now supports URLs for Azure Blob Storage service endpoints. The endpoint for a given account can be found in the overview pane of the storage account in the Azure portal, as shown in the following figure:
When defining an external stage in Azure Blob Storage, the service endpoint URL should be followed by the container name and can include additional path specifications. Note that the URL in the stage definition replaces ‘https:’ from the endpoint URL with ‘azure’. Client-side encryption is supported for files that are encrypted using an Azure Storage SDK or compatible encryption library. Customers can provide their encryption key to the CREATE STAGE command. Details on creating Azure stages can be found in the Snowflake documentation here.
After defining the external stage, customers can use Snowflake’s familiar COPY syntax to refer to the stage. For example, the following statement loads a batch of data files from the Azure Blob Storage container into a target table T1 in Snowflake:
COPY INTO T1 FROM @azstage/newbatch
Similarly, the following COPY statement exports the contents of an existing table T2 in Snowflake to a set of files in the Azure external stage:
COPY INTO @azstage/t2data FROM T2
The Snowflake external stage support for Azure Blob Storage complements Snowflake’s expansion across Amazon data centers worldwide. It now provides the ability to easily access data in Azure storage using built-in Snowflake functionality.
While cloud providers do not charge for data ingress, they do charge for data egress. For Snowflake customers using the new Azure external stages, importing data from Azure Blob Storage into Snowflake will incur data egress charges for the amount of data transferred out of their Blob Storage accounts. The charges accrue to the Azure subscription that the Blob Storage accounts belong to. The rate at which these charges occur depends on the geographical location of the Blob Storage account and the volume of data transferred during a billing period. You can find more details here.
Starting on February 1, 2018, we will pass through the cost of data egress from AWS and Microsoft through external stages. This cost will appear on your bill. Egress through JDBC and other drivers will continue to be free. Since data egress is uncommon among Snowflake customers, our initial analysis shows that many customers will not be affected by this change.
The Billing & Usage page in the Snowflake web portal tracks data transfer volumes similarly to warehouse utilization and storage volumes. More information on Snowflake data transfer prices can be found here.
To give an example, exporting a 1TB table across cloud providers, from Snowflake running in US West on AWS into an Azure Blob Storage container located in the Azure East US region, costs $90. Exporting the table within AWS and into an AWS S3 bucket located in US East costs $20 (cheaper since the transfer is within AWS US regions). Data transferred into an external stage in the same region and the same cloud provider continues to be free of charge, e.g., exporting the table from the previous example into an AWS S3 bucket in US West is free of charge.
Check out the Snowflake documentation for detailed information on exchanging data between Snowflake and Azure Blob Storage, as well as the new external stage support for Azure Blob Storage.