The topic of DevOps is trending right now and most of my clients are asking about it, especially those who want to build a modern data platform in the cloud. But actually doing DevOps for a data warehouse environment is difficult and has some unique challenges compared with doing DevOps for applications. Snowflake offers some revolutionary capabilities that significantly change the approach to DevOps for data warehouses.
In my previous two-part blog post, I wrote about building enterprise Azure data and analytics solutions around Snowflake:
- Part 1: How to use Azure Data Factory (ADF) with Snowflake
- Part 2: Additional components of an enterprise Snowflake solution in Azure
This blog post will build on the previous two blog posts by focusing on two aspects of DevOps with Snowflake: schema change management and continuous data protection.
Why Use Database DevOps?
DevOps is about automating (in code) every aspect of a software solution, including operations and security. It enables operations and security engineers to use the same techniques software developers use to automate their work. DevOps is one of the key pillars that enable agile software delivery and the following business results:
- Faster delivery of software
- Greater efficiency
- Repeatable, measurable processes
- Improved quality of software
- Faster innovation
- Ability to build the right solution for a given problem
- Cheaper delivery of software
- Greater return on investment and profitability
But database DevOps (especially for data warehouses) is difficult and has some unique challenges compared with DevOps for applications:
*Adapted from Francois Delport’s blog post Continuous Deployment Of Databases: Part 1
Schema Change Management
Foundational to any release automation for databases is a schema change management (or schema migration) tool. Achieving database DevOps requires managing all database objects in code and automating their deployment.
There are two approaches to schema change management: declarative and imperative. Here’s a brief breakdown of the two approaches:
After working with both approaches, my preference is the declarative style, but it requires an advanced schema comparison tool. The current market for declarative-style tools for Snowflake is small but growing quickly.
During a previous consulting project, I quickly created an imperative-style tool using PowerShell and the SnowSQL command line tool. I modeled the solution after the Flyway database migration tool. This enabled me to manage all Snowflake objects and automate their release.
Continuous Data Protection
Traditional on-premises data warehouses have unique challenges for automation (see “Why Use Database DevOps?” above). Chief among them is the requirement to maintain and protect valuable enterprise data while ensuring no downtime to the enterprise data warehouse during the release of database schema changes, data integration code, platform objects, or other changes. Historically this meant making costly backups before a release, complicated strategies for isolating tables during the release and ETL data load, and a time-consuming process for rolling back the database after a bad release (for example, if there are bugs in the release process, bugs in the published code, bad data, lost data, or other problems).
Snowflake’s continuous data protection (CDP) features offer some revolutionary capabilities, such as the Time Travel and Fail-Safe features, that change the approach to DevOps for data warehouses:
Snowflake features such as Fail-Safe and Time Travel eliminate the need to make traditional backups. Additionally, with zero-copy clones, the way in which you load and seed different environments is greatly simplified.
Leveraging the current Time Travel and cloning functionality, you can create a lightweight DevOps framework and use it to roll back the database in the event of a failed release or to isolate tables during ETL data load. Because you can automate essentially everything in Snowflake via SQL, you can easily integrate this framework into your schema change management process.
The Snowflake platform offers some revolutionary capabilities that simplify enabling DevOps for data warehouses. I’ve touched only briefly on a few of them, but there are many other features currently available in Snowflake that make a DevOps approach much more manageable. I’m keeping an eye on schema change management tools and eagerly waiting for more tools that support the declarative-style approach to schema change management with Snowflake.