Building Enterprise Azure Data & Analytics Solutions Around Snowflake (Part 1)
Okt 04, 2018
Author: Jeremiah Hansen
With Snowflake’s recent announcement of General Availability (GA) on Azure, this is a good opportunity to share details about a project we’re working on here at Slalom. We’re building an enterprise data and analytics solution around Snowflake on Azure for a global retailer looking to replace their existing legacy on-premises data and analytics solution with a modern, cloud-based solution.
This will be a two-part blog post. The first part will focus on Snowflake and Azure Data Factory. The second part will focus on the other necessary Azure services and integrations. Written for BI solution architects and key BI decisions makers, the two posts will help readers understand what’s possible on Snowflake and Azure and provide them a pattern for getting started.
A Fully Customized Solution Architecture For Azure Enterprise Analytics
Before we jump into the details of Azure Data Factory, I want to call out two guiding principles for the solution:
- Design an Azure first solution as opposed to taking an existing AWS solution and trying to lift-and-shift it.
- Design a serverless solution using only Platform as a Service (PaaS) or higher Azure services to avoid spending time building/configuring networking and servers.
With that in mind, below is an overall view of our customized solution architecture.
Azure Data Factory
Azure Data Factory (ADF) is Microsoft’s fully managed ETL service in the cloud that’s delivered as a Platform as a Service (PaaS) offering. With the exception of the ADF Integration Runtime (to connect to on-premises data sources), there’s no need to procure software licenses, stand up servers or configure networking.
I’ve recently had a few clients express reservations about ADF based on their experience with the first version of the service. If that’s your only experience with ADF, I encourage you to check out the new version. ADF v2 provides a new, flexible and highly intuitive approach for creating custom activities.
Below is a diagram of an end-to-end pipeline that loads data from a source (In this case, Azure SQL DB) to Snowflake.
The most important part of the solution is how we’re connecting to Snowflake. In the pipeline diagram above, you’ll see that most of the activities in this pipeline are custom activities.
Microsoft has a great how-to guide on using custom activities in an Azure Data Factory pipeline that helped jumpstart our process. The how-to guide shows you how to use C# .NET in the custom activity. It also shows you how to pass inputs to the custom activity along with how to debug the output. What the documentation lacks, however, is a diagram showing how the various aspects of the custom activity work together, so I put one together for you below.
Another thing not mentioned in the how-to guide that is key to our solution is the ability to directly return values to the custom activity using an “outputs.json” file. Whenever your custom activity creates an “outputs.json” file, ADF will automatically pick it up and make it available to other downstream activities in the pipeline.
Connecting to Snowflake from our C# .NET custom activity was the last piece of the puzzle. Leveraging the Snowflake .NET driver, we used the instructions provided by the Snowflake team in their Snowflake Connector for .NET GitHub repo. After connecting directly to Snowflake, we can now run queries and pass parameters to and from the activities. Below is a screenshot of a custom activity’s extended properties settings. This illustrates how we passed parameterized values to the command.
What’s coming in part 2?
In Part 2 of the post, I’ll explain how we developed a solution for creating the remaining components of an enterprise data and analytics solution around Snowflake on Azure. This will include logging, serverless notifications, source control, reporting and security. Stay tuned.
Jeremiah Hansen is a Solution Principal at Slalom, a Snowflake Implementation Partner headquartered in Seattle. Specializing in analytical data platforms built in the cloud, he focuses on helping enterprises get the most out of their data.