How to Get Started with Snowflake on Azure

Snowflake is now available on Microsoft Azure for preview in the East US 2 region. Collaborating closely with the Microsoft Azure team, we ensured we could build the familiar scalability, performance and reliability into Snowflake on Azure. We leverage several new Azure features, including limitless storage accounts, accelerated networking, and storage soft delete. The goal is to provide the same Snowflake experience no matter which cloud infrastructure provider customers choose, with no barriers to entry. Snowflake on Azure will make it easier than ever for teams across an organization to become more data-driven, efficient and productive with their most valuable resources: data and people.

The Technical Details

Snowflake on Azure is architected to run on Azure, leveraging Azure compute and storage infrastructure services for data storage and query processing.

Figure 1.

As Figure 1 shows, Snowflake relies on Azure Blob Storage for data storage. To achieve scalable, highly performing data access, Snowflake stripes customer data across many storage accounts in Azure. Customer requests are processed by what we call virtual warehouses. A virtual warehouse is a set of virtual machines provisioned by Snowflake on Azure Compute. Each virtual warehouse with its virtual machines is dedicated to a single customer’s account. Snowflake receives requests via a load balancer. Requests leverage Snowflake’s cloud services and metadata layers for authentication, query compilation, transaction management, security, data sharing and other capabilities.

First Steps on Azure

The most powerful insights often come from analytics that tie together different data sets. For this blog post, we will explore a scenario that uses Snowflake on Azure to correlate clickstream data from a customer-facing website with transactional data from an order processing system and visualize the results. To do that, we’ll use the following services in Azure:

  • Snowflake on Azure: We’ll show you to connect to the Snowflake web UI to manage your Snowflake account, provision warehouses, explore your Snowflake databases, run queries, etc.
  • Azure Blob Storage: In this example, Azure Blob Storage stages the load files from the order processing system.
  • Azure Data Lake Store: The clickstream logs in this examples are stored in Azure Data Lake Store (Gen1) from where we will load them into Snowflake.
  • PowerBI: Finally, we will connect PowerBI Desktop to Snowflake on Azure to visualize the results of the analytics.   

The following paragraphs walk you through the different Azure data services and explain how to use them together with Snowflake on Azure.

Snowflake in Azure: Using Snowflake through the Web UI

Customers access their Snowflake account on Azure using a URL such as https://<accountname>.east-us-2.azure.snowflakecomputing.com/. After authenticating, you can use the familiar Snowflake web UI to manage your databases, warehouses and worksheets, and access your query history and account details. The screenshot below (Figure 2) shows a Snowflake worksheet with the object explorer on the left, the query editor in the center and the query results at the bottom.

 

Figure 2.

In our example scenario, we’ll use the Snowflake web UI to submit the COPY statements to load data from Azure Blob Storage into Snowflake databases – which we will explain next.

Azure Blob Storage: Loading data from Azure Blob Storage into Snowflake

In this example, we assume that you already exported the data from the transactional order processing system into load files in Azure Blob Storage. Once the data is in Azure Blob Storage, you can load it into Snowflake using the existing support for external stages in Azure Storage (https://www.snowflake.com/global-snowflake-loading-data-into-snowflake-from-azure-blob-storage/). Now, you can use these familiar steps to create a stage in Azure storage and run the COPY command to load the data:

  1. Store your load files in an Azure Blob Storage container.
  2. Create a Snowflake stage object referring to the blob storage location in its URL parameter:

    CREATE STAGE azstage
    URL = azure://<storageaccount>.blob.core.windows.net/tpch1000
    CREDENTIALS=(AZURE_SAS_TOKEN=…)
  3. Use the COPY statement to load data files from the stage into a target table (ORDERS in this example) in Snowflake:

    COPY INTO Orders
    FROM @azstage/orders

You can find more information about how to configure your Azure Blob Storage account for data loading with Snowflake in the following documentation page: https://docs.snowflake.net/manuals/user-guide/data-load-azure-config.html.

Consider using Azure Data Factory (https://azure.microsoft.com/en-us/services/data-factory/) for richer orchestration scenarios. Azure Data Factory helps with extracting data from multiple Azure services and persist the data as load files in Blob Storage.

You can use these steps to load the files with the order processing data from Azure Blob Storage. For this example, we have been using TPCH, a common data set, and Figure 3 shows the blob storage account with the data directories. You can use several COPY statements like the one above to populate the order processing data in your Snowflake tables.

 

Figure 3.

Azure Data Lake Store: Connecting Snowflake with Azure HDInsight, Spark and Azure Data Lake Store

Many customers rely on Apache Spark as an integral part of their data analytics solutions. Snowflake natively integrates with Spark through its Spark connector. Below you can see how to use Spark in Azure HDInsight together with Snowflake. The use case implements a data pipeline originating from data stored in Azure Data Lake Store via HDInsight Spark into a Snowflake table.

Our running example will use this approach to make the clickstream data available in Snowflake next to the order processing data. The following screenshot (Figure 4) shows the folders with load files for the clickstream data in Azure Data Lake Store.

Figure 4.

The HDInsight cluster shown in Figure 5 below runs Spark 2.2. The cluster has been configured with a service principal so that it can access the clickstream files in the data lake account.

Figure 5.

Using the built-in Jupyter notebook capability of HDInsight, you can now run the following simple PySpark program to populate a dataframe in Spark with the clickstream data from data lake store and then persist the dataframe into a Snowflake table:

%%configure
{ “conf”: {“spark.jars.packages”: “net.snowflake:spark-snowflake_2.11:2.4.0-spark_2.2” }}

df = spark.read.csv(“adl://azuredemo.azuredatalakestore.net/clickstreams/20180629/*”, header=“true”)

sfOptions = {
 “sfURL”: “<sfaccount>.east-us-2.azure.snowflakecomputing.com”,
 “sfAccount”: “<sfaccount>”,
 “sfUser”: “<user>”,
“sfPassword”: “<password>”,
“sfDatabase”: “<dbname>”,
“sfSchema”: “<schemaname>”,
“sfWarehouse”: “<whname>”
}

SNOWFLAKE_SOURCE_NAME = “net.snowflake.spark.snowflake”

df.write.format(SNOWFLAKE_SOURCE_NAME).options(**sfOptions).option(“dbtable”, “CLICKSTREAM”).mode(“append”).save()

 

Note the use of the %%configure magic in the first lines of the code snippet. This loads the Snowflake Spark Connector that provides deep integration between Spark and Snowflake for highly optimized performance. After loading the clickstream data into the dataframe df, you can perform further transformations in Spark before writing the result into a Snowflake table. The last line of the code shows how to append data from the Spark dataframe to a table called CLICKSTREAM in Snowflake.

Now that we have loaded both the transactional data and the clickstream logs into Snowflake, we are ready to start analyzing the data sets. The next paragraph explains how to do that with Snowflake on Azure and Microsoft PowerBI.

PowerBI: Visualizing your Snowflake Analytics

You can connect PowerBI Desktop to Snowflake by installing the Snowflake ODBC driver on your PowerBI Desktop machine. Once installed, you can use the built-in Snowflake data source for PowerBI. Click on the ‘Get Data’ button in PowerBI and then choose the ‘Database’ section, as shown in Figure 6. You can find Snowflake at the bottom of the list of supported data sources.

Figure 6.

After connecting to Snowflake using the hostname from your Snowflake account URL, you can explore the tables and develop powerful reports and charts. In Figure 7 below, you can see a chart created from both the transactional and the clickstream data sets. It shows the distribution of daily online customer activity from the clickstream logs over customer market segments from the order processing data.

Figure 7.

You can also publish your Snowflake reports to the PowerBI service. Note that you need the PowerBI gateway to connect PowerBI service to Snowflake.

This completes our running example for Snowflake on Azure. As part of this scenario, we have seen how to load data from both Azure Blob Storage and Azure Data Lake Store into Snowflake on Azure. We also demonstrated how to use Microsoft PowerBI to visualize your analytics in Snowflake.

Resources
Webinar: Snowflake on Azure: Modern Data Analytics, August 23, 11am PT
Blog: Providing Customer Choice: Snowflake on Azure
Partner Solutions: Snowflake on Microsoft Azure

Give it a try!

We’re excited to see what you build with Snowflake on Microsoft Azure. Create your Snowflake  account on Azure today. And, of course, we’d love to hear your feedback. Join the Snowflake on Azure community on the Snowflake website to share your feedback, experiences, tips & tricks and to ask questions.

subscribe to the snowflake blog

Global Snowflake: Loading Data into Snowflake from Azure Blob Storage

Snowflake is available in a number of AWS regions as we continue to expand our geographic presence to meet customer demand, reduce latency and satisfy industry compliance and country-specific regulations.

As the global adoption of Snowflake continues, we also expect customers to transfer increasingly higher data volumes between regions. There are several potential scenarios that, taken together, are driving this trend, including:

  • Data sharing across regions. A data set maintained by a Snowflake account in a specific region can be easily shared and consumed by others in different regions.
  • Synchronizing Snowflake data between different geographic regions for disaster recovery purposes.
  • Importing and exporting raw data from storage in other cloud providers. A common request from Snowflake customers running Snowflake on AWS was to support import from and export to Blob Storage in Microsoft Azure.

Therefore, we are happy to announce additional integration capabilities with Blob Storage in Microsoft Azure. This is part of Snowflake’s commitment to becoming a global cloud provider, supporting customers no matter which cloud provider(s) they choose. The integration with Azure Blob Storage complements Snowflake’s existing functionality for data loading and unloading. From now on, the Snowflake SQL commands to define an external stage support the URL and credential specifications for Azure Blob Storage. This allows customers to import data from and export data to Azure Blob Storage containers.

This targets architectures with data sets in Azure. Customers using Microsoft Azure can now easily exchange data with Snowflake to benefit from Snowflake’s leading analytics capabilities. The following picture illustrates the overall approach:

 

The key integration point between Azure and Snowflake is a container in Azure Blob Storage. Snowflake expects that any data to be loaded from the Azure application into Snowflake is placed in a container. This container is then registered as an external stage in Snowflake using the following Snowflake SQL command:

CREATE STAGE azstage

URL = azure://<account>.blob.core.windows.net/<container>/<path>

CREDENTIALS=(AZURE_SAS_TOKEN=…)

The URL parameter of the CREATE STAGE command now supports URLs for Azure Blob Storage service endpoints. The endpoint for a given account can be found in the overview pane of the storage account in the Azure portal, as shown in the following figure:

When defining an external stage in Azure Blob Storage, the service endpoint URL should be followed by the container name and can include additional path specifications. Note that the URL in the stage definition replaces ‘https:’ from the endpoint URL with ‘azure’. Client-side encryption is supported for files that are encrypted using an Azure Storage SDK or compatible encryption library. Customers can provide their encryption key to the CREATE STAGE command. Details on creating Azure stages can be found in the Snowflake documentation here.

After defining the external stage, customers can use Snowflake’s familiar COPY syntax to refer to the stage. For example, the following statement loads a batch of data files from the Azure Blob Storage container into a target table T1 in Snowflake:

COPY INTO T1 

FROM @azstage/newbatch

Similarly, the following COPY statement exports the contents of an existing table T2 in Snowflake to a set of files in the Azure external stage:

COPY INTO @azstage/t2data 

FROM T2

The Snowflake external stage support for Azure Blob Storage complements Snowflake’s expansion across Amazon data centers worldwide. It now provides the ability to easily access data in Azure storage using built-in Snowflake functionality.

While cloud providers do not charge for data ingress, they do charge for data egress. For Snowflake customers using the new Azure external stages, importing data from Azure Blob Storage into Snowflake will incur data egress charges for the amount of data transferred out of their Blob Storage accounts. The charges accrue to the Azure subscription that the Blob Storage accounts belong to. The rate at which these charges occur depends on the geographical location of the Blob Storage account and the volume of data transferred during a billing period. You can find more details here.   

Starting on February 1, 2018, we will pass through the cost of data egress from AWS and Microsoft through external stages. This cost will appear on your bill. Egress through JDBC and other drivers will continue to be free. Since data egress is uncommon among Snowflake customers, our initial analysis shows that many customers will not be affected by this change.

The Billing & Usage page in the Snowflake web portal tracks data transfer volumes similarly to warehouse utilization and storage volumes. More information on Snowflake data transfer prices can be found here.

To give an example, exporting a 1TB table across cloud providers, from Snowflake running in US West on AWS into an Azure Blob Storage container located in the Azure East US region, costs $90. Exporting the table within AWS and into an AWS S3 bucket located in US East costs $20 (cheaper since the transfer is within AWS US regions). Data transferred into an external stage in the same region and the same cloud provider continues to be free of charge, e.g., exporting the table from the previous example into an AWS S3 bucket in US West is free of charge.

Check out the Snowflake documentation for detailed information on exchanging data between Snowflake and Azure Blob Storage, as well as the new external stage support for Azure Blob Storage.