This session aims to provide the attendees with an end-to-end solution for implementing Data Ops lifecycle with Snowflake leveraging Terraform Cloud for Infrastructure-as-Code activities and dbt Cloud for creating and deploying data pipelines. 

In this Lab, we are using Customer and Order datasets (that are derived from SNOWFLAKE_SAMPLE_DATASETS – based on TPC-H) as well as Star Schema Covid-19 datasets from the Snowflake Marketplace to build structures to determine how covid-19 case counts might have impacted the total order amounts.  To accomplish this, Terraform Cloud (a free Developer account) is configured and used to provision account level objects for this lab. After creating a Terraform configuration in Git, we run a workflow to create Virtual Warehouses, databases, schemas, external stages and grants. Respectively, all the database objects are created within a dbt pipeline using dbt Cloud. The dbt pipeline reads data from two external tables (Customer and Orders) and transforms the customer and order datasets into a curated dataset that is joined with StarSchema Covid-19 dataset using the customer’s country for further analysis. We also demonstrate how we create a data lineage graph as well as how we set up automated tests in dbt. Finally, we build a CI pipeline in dbt Cloud to build, test and deploy the code into a Prod environment.

 

Agenda

  • Pre-requisite setup to install Terraform Cloud with Git configuration, dbt Cloud and Snowflake  (prior to the session) – 30 minutes
  • DataOps overview presentation -15 mins
  • Terraform steps – 15 minutes
  • Dbt Cloud steps – 1 hr