Building Enterprise Azure Data & Analytics Solutions Around Snowflake (Part 2)
Okt 25, 2018
Author: Jeremiah Hansen
Welcome to the second of my two-part blog post on building enterprise Azure data & analytics solutions around Snowflake. Part 1, Building Enterprise Azure Data & Analytics Solutions Around Snowflake, focused on using Azure Data Factory (ADF) with Snowflake to begin building out a data and analytics solution for a global retailer. Part 2 will focus other key considerations for the project, including:
- Serverless Notifications
- Source Control
As I mentioned in Part 1, these posts were written with BI solution architects and key BI decisions makers in mind. The goal is to help readers understand what’s possible on Snowflake and Azure and provide them with a pattern for getting started. Let’s begin by reviewing the overall solution architecture.
Our logging solution has two parts: Part one is the native ADF logging (provided by Azure), which integrates with the Azure Monitor service. We configured the Azure Monitor service to dump ADF logs to a Blob Storage account. Using our new ADF custom activity process (described in Part 1), we then created a pipeline and loaded the logs into Snowflake using the optimized, bulk COPY INTO command. We also used Snowflake’s built-in ability to work with semi-structured JSON data to handle ADF’s native JSON logs.
Part two is our ADF framework log. This framework provides details about each pipeline execution (within a custom logging table) that we created in Snowflake. Notice that the first and last activity in the ADF pipeline shown below are labeled “Log Start” and “Log End”. Likewise, notice that all error activities start with “Log Error”. We adapted our standard ETL/ELT logging framework to capture ADF-related attributes, including the pipeline and trigger details.
Featured in the diagram below is the lightweight notification capability we developed using Microsoft’s Azure Functions offering. To connect with Snowflake, we re-used our existing C# .NET codebase within our Azure Function. Next, we queried our custom ADF logging table in Snowflake to find any pipeline failures (configurable). Lastly, we sent an email notification for each ADF failure via Office 365.
This lightweight process is helpful for getting started. However, in the future, we plan to use Azure Log Analytics/OMS (now considered part of the Azure Monitor service) for our enterprise monitoring solution. It’s important to note that the ability to connect to Snowflake from an Azure Function is viable in many other scenarios.
One of the great features of ADF is its native integration with Git—both from GitHub and from Microsoft’s newly rebranded Azure DevOps (previously known as Visual Studio Online or Visual Studio Team Services). See Microsoft’s how-to guide, Visual authoring in Azure Data Factory, for more details. Natively managing all ADF objects in source control, including the integrated collaboration around code reviews, is a great time saver. We also store the source code from the rest of the solution in the same GitHub repo.
Creating reports and dashboards against our Snowflake solution using Power BI’s native Snowflake connector was seamless. Pulling from our custom logging table (shown above), our first report and dashboard were operationally focused. In less than an hour, our basic operational dashboard showed the number of pipeline runs, failures and trends for duration, along with row counts for each pipeline —with everything connecting directly to Snowflake!
A critical aspect of any enterprise solution is securely managing credentials and other secrets. Make it a priority to ensure that all credentials and secrets are easy to rotate or change and that they are not hard coded or stored anywhere in plain text.
Our security solution includes two components: The first component is the native Azure Active Directory integration. Using the directory allows us to centrally manage users and groups across our environment. The second component is Azure Key Vault, which we use to store our credentials. All of our ADF-linked services and custom C# .NET code leverages Key Vault to store credentials (See examples below). There is a slight difference between connecting to Key Vault from a standalone console app versus an Azure Function. Specifically, when connecting from an Azure Function, you don’t need to explicitly specify the Azure Active Directory application client ID or secret.
We’re continuing our work building an enterprise Azure data and analytics solution, including how to manage our database source code and deploy it to Snowflake. We also plan to continue working on automating deployments and setting up our CI/CD pipelines. I’m looking forward to seeing more development tools evolve and mature to better support the integration between Azure and Snowflake.
Jeremiah Hansen is a Solution Principal at Slalom, a Snowflake Implementation Partner headquartered in Seattle. Specializing in analytical data platforms built in the cloud, he focuses on helping enterprises get the most out of their data.