Skip to content
  • AT SNOWFLAKE
  • Industry solutions
  • Partner & Customer Value
  • Product & Technology
  • Strategy & Insights
Languages
  • English
  • Français
  • Deutsch
  • Italiano
  • 日本語
  • 한국어
  • Português
  • Español
  • English
  • Français
  • Deutsch
  • Italiano
  • 日本語
  • 한국어
  • Português
  • Español
  • AT SNOWFLAKE
  • Industry solutions
  • Partner & Customer Value
  • Product & Technology
  • Strategy & Insights
  • English
  • Français
  • Deutsch
  • Italiano
  • 日本語
  • 한국어
  • Português
  • Español
  • Overview
    • Why Snowflake
    • Customer Stories
    • Partners
    • Professional Services
  • OVERVIEW
    • Platform
    • Snowflake Horizon
    • Data Cloud Explained
    • Snowflake Marketplace
    • Snowpark
    • Streamlit
    • Powered by Snowflake
    • Live Demo
    • Cross-Cloud Snowgrid
  • WORKLOADS
    • AI / ML
    • Applications
    • Collaboration
    • Cybersecurity
    • Data Engineering
    • Data Lake
    • Data Warehouse
    • Unistore
  • PRICING
    • Pricing Options
    • Cost & Performance Optimization
  • Industries
    • Advertising, Media, and Entertainment
    • Financial Services
    • Healthcare & Life Sciences
    • Manufacturing
    • Public Sector
    • Retail & Consumer Goods
    • Technology
    • Telecom
  • Departments
    • Marketing
    • IT
    • Finance
    • Cybersecurity
  • Learn
    • Resources
    • Developers
    • Quickstarts
    • Documentation
    • Hands-on Labs
    • Training
    • Certifications
    • Guides
    • Glossary
  • Connect
    • Blog
    • Community
    • Events
    • Webinars
    • Podcast
    • Support
    • Trending
  • Overview
    • About Snowflake
    • Investor Relations
    • Leadership & Board
    • Careers
    • Newsroom
    • Speakers Bureau
    • ESG at Snowflake
    • Snowflake Ventures
Author
Patrick Cuba Patrick Cuba
Share
Subscribe
May 02, 2022

Data Vault Techniques on Snowflake: Immutable Store, Virtual End Dates

  • Product and Technology
    • Data Engineering
Data Vault Techniques on Snowflake: Immutable Store, Virtual End Dates

Snowflake continues to set the standard for data in the cloud by eliminating the need to perform maintenance tasks on your data platform and giving you the freedom to choose your data model methodology for the cloud. Through this and more posts, we will discuss some Snowflake features you should consider that will enable your Data Vault to scale as dynamically as Snowflake scales.

Over the next few months, we will be publishing additional blog posts covering:

  1. Immutable store, virtual end dates
  2. Snowsight dashboards for Data Vault
  3. Point-in-time constructs and join trees
  4. Querying REALLY big satellite tables
  5. Streams and Tasks on views
  6. Conditional multi-table INSERT, and where to use it
  7. Row access policies and multi-tenancy
  8. Hub locking on Snowflake
  9. Virtual warehouses and charge-back

A reminder of the data vault table types:

An immutable versus a mutable object is the difference between an object that cannot be changed and an object that can be changed. To Snowflake these are the compressed and encrypted immutable 16MB micro-partitions that make up Snowflake mutable tables. Snowflake micro-partitions immutable files that are not visible to the Snowflake user; rather, the table acts as a container of micro-partitions and that is what the user will interact with. Some of the interactions are:

  • A SQL INSERT operation that will load new records as new micro-partitions
  • A SQL DELETE operation that will commit records and their micro-partitions to Time Travel
  • A SQL UPDATE operation that will INSERT new records to the table and commit the old state of that record to Time Travel

More on Time Travel in a bit…

Simplified data representation, hash key will be a deterministic digest value on a parent key

A data vault satellite table contains the descriptive state of a business object (based on a hub table), or the descriptive state of a unit of work (based on a link table). A satellite table appears as a Kimball Slowly Changing (SCD) Type 2 dimension complete with start and end dates, and each SQL UPDATE by the parent key (hash-key) will in fact INSERT two records into the table. As demonstrated above, HASH KEYs 3, 5 and 6 have updates to an existing record and therefore we need the new state of these records to reflect what they look like once the SQL UPDATE has completed. And remember, micro-partitions are immutable.

A SQL SELECT query before the update on January 4 returns:

  • HashKey 1, StartDate: 01-JAN-2022, EndDate: 31-DEC-9999
  • HashKey 2, StartDate: 01-JAN-2022, EndDate: 31-DEC-9999
  • HashKey 3, StartDate: 01-JAN-2022, EndDate: 31-DEC-9999
  • HashKey 4, StartDate: 02-JAN-2022, EndDate: 31-DEC-9999
  • HashKey 5, StartDate: 02-JAN-2022, EndDate: 31-DEC-9999
  • HashKey 6, StartDate: 03-JAN-2022, EndDate: 31-DEC-9999

Satellite table as a Snowflake Base Table with Time Travel and Fail-Safe

At rest, the pre- and post-SQL UPDATE records exist on disk at the same time; however, the pre-SQL UPDATE records are only available by using Time Travel. Time Travel is not the same as an SCD Type 2 dimension; rather, think of Time Travel as a live backup of the satellite table. Time Travel can be set from 0 to 90 days and retrievable using Snowflake-extended SQL Time Travel syntax. Once the records fall out of the Time Travel period, they are set to the 7-day Fail-Safe period and only retrievable by contacting Snowflake support. Beyond those seven days those records are purged from Snowflake and unretrievable.

  • Permanent tables can have Time Travel set from 0-90 days, 7 Fail-Safe cannot be changed.
  • Transient tables can have 0-1 days of Time Travel and does not have Fail-Safe.

Time Travel is a live backup of your data you configure using Snowflake extended SQL

Querying the table after the update returns the current state of the records within that table, while querying using Time Travel returns the previous state of the table. Note the storage needs for HASH KEYS 3, 5, and 6. Therefore a SQL SELECT query on January 18 returns:

  • HashKey 1, StartDate: 01-JAN-2022, EndDate: 31-DEC-9999
  • HashKey 2, StartDate: 01-JAN-2022, EndDate: 31-DEC-9999
  • HashKey 3, StartDate: 01-JAN-2022, EndDate: 15-JAN-2022
  • HashKey 3, StartDate: 16-JAN-2022, EndDate: 31-DEC-9999
  • HashKey 4, StartDate: 02-JAN-2022, EndDate: 31-DEC-9999
  • HashKey 5, StartDate: 02-JAN-2022, EndDate: 16-JAN-2022
  • HashKey 5, StartDate: 17-JAN-2022, EndDate: 31-DEC-9999
  • HashKey 6, StartDate: 03-JAN-2022, EndDate: 16-JAN-2022
  • HashKey 6, StartDate: 17-JAN-2022, EndDate: 31-DEC-9999
  • HashKey 7, StartDate: 16-JAN-2022, EndDate: 31-DEC-9999
  • HashKey 8, StartDate: 16-JAN-2022, EndDate: 31-DEC-9999

Data Vault 2.0 Satellite tables has no END-DATE column

Data Vault 2.0 recognized the expense in executing SQL UPDATE operations for some years now, and with the advancement of the SQL language itself, Data Vault 2.0 as a practice has deprecated the END-DATE column on the satellite table. Using SQL LEAD Analytical function, the end date of a parent key is now virtualized by defining an SQL VIEW over that satellite table right after the satellite table has been created.

Create View as select from table

As the data is a SQL INSERT operation with no SQL UPDATE operations needed, this means that table updates are only ever inserts of new records. This also means it does not matter at what velocity the data arrives in, Snowflake is not rapidly or slowly churning new micro-partitions trying to keep the table up to date. Nor is it creating a huge number of micro-partitions in Time Travel and Fail-Safe storage as if it were a high-churn table It also means that the Snowflake table does not need to have column clustering set, as the satellite table will naturally cluster by START-DATE. Most analytics are based on the current-date by parent-key anyway and therefore there is no need to recluster the table at all.

Cloning and Time Travel

Zero-Copy Cloning

Finally, a clone of a satellite table can be created across environments within a Snowflake account. A clone is a snapshot of the table at that point in time, and any new data added to PROD is not accessible by DEV (where the clone exists) and any new data added to DEV is not accessible by PROD. This patented Snowflake capability accelerates your DevOps processes by making instant copies of production-quality data available to develop on and no storage is replicated, only the metadata is. 

Cloning can be extended to cloning entire schema or even entire databases, and you can even perform a schema swap for blue/green testing!

For example:

STEP 1: Clone schema to a DEV schema, perform changes

STEP 2: Catch up data loads to Dev clone

STEP 3: Swap Dev and Prod Schema

In summary 

Separation of compute and storage and managing micro-partitions based entirely on metadata accelerates your DevOps processes, and with Data Vault 2.0 being an INSERT-ONLY methodology, it naturally fits and scales as Snowflake scales. There isn’t a need to track the previous state of a record when none is persisted to Time Travel. 

However, you can indeed use Snowflake’s Time Travel feature to take a clone of your Data Vault at a snapshot point in time to accelerate your DevOps development and testing—or whatever your heart desires!

Data Vault automation and orchestration
Additional references
  • Snowflake micro-partitions and clustering: https://docs.snowflake.com/en/user-guide/tables-clustering-micropartitions.html
  • Consideration for High Churn tables https://docs.snowflake.com/en/user-guide/tables-storage-considerations.html#managing-costs-for-large-high-churn-tables
  • Snowflake Time Travel https://docs.snowflake.com/en/user-guide/data-time-travel.html
  • Snowflake table types https://community.snowflake.com/s/article/Different-Types-of-Objects-in-Snowflake
  • Snowflake cloning https://docs.snowflake.com/en/user-guide/object-clone.html
Share

Related Content

  • Product and Technology
    • Data Engineering
Feb 03, 2020 7 min read

Tips for Optimizing the Data Vault Architecture on Snowflake

Data Vault is an architectural approach that includes a specific data model design pattern and…

Full Details
Read More
  • Partner & Customer Value
    • Partner Perspectives
May 09, 2022

Data Vault Automation with erwin and Snowflake: Building and Automating a Scalable Data Warehouse Based on Data Vault 2.0

PLEASE NOTE: This post was originally published in 2020. It has been updated to reflect…

Here's How
Read More
Support Multiple Data Modeling Approaches with Snowflake

Support Multiple Data Modeling Approaches with Snowflake

Discover how Snowflake supports multiple data modeling approaches equally, including data vault and DV 2.0.

Have a look
Read More
Data Mining Tools

Data Mining Tools

Data mining tools are software solutions that use computer science to process data, aggregating and analyzing data to find patterns and bring forward the most pertinent information quickly. Learn more.

See how
Read More
Data Vault Techniques on Snowflake: Out-of-Sequence Data

Data Vault Techniques on Snowflake: Out-of-Sequence Data

Snowflake continues to set the standard for data in the cloud by removing the need to perform maintenance tasks on your data...

Learn More
Read More
Java Programming: Leveraging One of the Most Popular Coding Languages

Java Programming: Leveraging One of the Most Popular Coding Languages

From mapping the human genome to helping people to stream their favorite movies and music, Java programming is powering many modern technologies. Learn more here.

More Details
Read More
Snowflake Inc.
  • Platform
    • Cloud Data Platform
    • Pricing
    • Marketplace
    • Security & Trust
  • Solutions
    • Snowflake for Financial Services
    • Snowflake for Advertising, Media, & Entertainment
    • Snowflake for Retail & CPG
    • Healthcare & Life Sciences Data Cloud
    • Snowflake for Marketing Analytics
  • Resources
    • Resource Library
    • Webinars
    • Documentation
    • Community
    • Procurement
    • Legal
  • Explore
    • News
    • Blog
    • Trending
    • Guides
    • Developers
  • About
    • About Snowflake
    • Investor Relations
    • Leadership & Board
    • Snowflake Ventures
    • Careers
    • Contact

Sign up for Snowflake Communications

Thanks for signing up!

  • Privacy Notice
  • Site Terms
  • Cookie Settings
  • Do Not Share My Personal Information

© 2023 Snowflake Inc. All Rights Reserved |  If you’d rather not receive future emails from Snowflake, unsubscribe here or customize your communication preferences