PLEASE NOTE: This post was originally published in 2019. It has been updated to reflect currently available features and functionality.
Azure Data Factory (ADF) is Microsoft’s fully managed, serverless data integration service.
Since September 26, 2018, when Snowflake announced the general availability of its service on the Azure public cloud, the question that I have been asked the most by Azure customers is how to best integrate ADF with Snowflake. Native ADF support for Snowflake has come primarily through two main releases, the first on June 7, 2020, and the second on March 2, 2022. And with this recent second release, Azure Snowflake customers are now able to use ADF as their end-to-end data integration tool with relative ease.
Over the years there have been many different approaches to integrating ADF with Snowflake, but they have all had their share of challenges and limitations. In fact, the original version of this blog post offered one such approach. But with this second release we can finally put all of those clunky workarounds behind us.
This updated blog post will outline the native Snowflake support offered by ADF and offer a few suggestions for getting started. For more information on ADF’s native support for Snowflake, please check out ADF’s Snowflake Connector page.
Native Snowflake connector
The native Snowflake connector for ADF currently supports these main activities:
The Copy activity is the main workhorse in an ADF pipeline. Its job is to copy data from one data source (called a source) to another data source (called a sink). The Copy activity provides more than 90 different connectors to data sources, including Snowflake. Snowflake can be used both as a source or a sink in the Copy activity. With this activity you can ingest data from almost any data source into Snowflake with ease. For the complete list of supported data sources, see the ADF Copy activities supported source/sink matrix table.
The second activity to consider in ADF is the Lookup activity. The Lookup activity is able to retrieve a small number of records from any of the supported data sources in ADF. The primary purpose of the Lookup activity is to read metadata from configuration files and tables, which can then be used in subsequent activities to create dynamic, metadata-driven pipelines. While you are able to call a stored procedure from the Lookup activity, Microsoft does not recommend that you use the Lookup activity to call a stored procedure to modify data. If you are trying to execute a stored procedure to modify data, then consider the Script activity, discussed next.
The third activity to consider in ADF is the new Script activity. This newly released Script activity in ADF provides the much anticipated capability to run a series of SQL commands against Snowflake. And importantly, this activity can be used to execute data manipulation language (DML) statements and data definition language (DDL) statements, as well as execute stored procedures. This gives users the flexibility to transform data they have loaded into Snowflake while pushing all the compute into Snowflake. It’s with this new activity that customers are now able to create end-to-end pipelines with Snowflake. For an announcement of the new Script activity, including a nice comparison of the Lookup and Script activities, see the ADF team’s recent blog post.
Getting started with ADF and Snowflake is now super easy. All you need to do is create a Linked Service to Snowflake, create a new pipeline, and then begin using one or more of the three activities outlined above to interact with Snowflake. And please note that ADF will only connect to Snowflake accounts in Azure.
As mentioned above, with this second release we can finally put all of those clunky workarounds behind us. So you can safely ignore the other blog posts on the internet which provide some custom workaround or connector for ADF and Snowflake.
I’m excited to see all the awesome ways that customers make use of the Snowflake Connector for ADF. And to get the juices flowing, here is a very cool blog post from my Snowflake colleague Chuang Zhu, who combined the new Script activity in ADF with Snowflake’s new Schema Detection capabilities to create an ADF pipeline, which can dynamically create the target table in Snowflake when loading data. As he discusses in the conclusion, this can be combined with the Lookup activity to create dynamic ADF pipelines for ingesting data! Check out Seamless migration to Snowflake using ADF Script Activity + Schema detection for all the details.