Data Engineering

Introducing Immutability for Snowflake Dynamic Tables

Snowflake Dynamic Tables provide a simple, declarative way to express data pipelines. You define what the table should contain, and Snowflake automatically keeps it up to date, handling incremental updates and refreshes so you can focus on insights instead of maintenance. 

Dynamic Tables operate with delayed view semantics: their contents match those of a view with the same definition, but possibly at an earlier point in time as configured by the target_lag parameter. This model is a powerful way to express complex transformation pipelines, but sometimes you need more control over which rows are updated or recomputed.

For example, you might want to:

  • Delete data from base tables without affecting downstream Dynamic Tables

  • Update a Dynamic Table’s definition without updating previously computed rows to match the new definition

  • Avoid reprocessing old records when a dimension table changes

  • Migrate an existing pipeline to Dynamic Tables without recomputing existing rows

To address these use cases, we are excited to introduce immutability constraints for Dynamic Tables. With the IMMUTABLE WHERE constraint, you can now lock specific portions of your Dynamic Tables so they never change during refreshes. No rows in this marked portion can be updated or deleted, and no new rows can be added to the marked portion. This means less recomputation, which can significantly lower the cost of refreshes.

For example, here’s a simple pipeline that enriches transaction data with a join and then computes time-windowed aggregates. Using this new constraint, aggregates older than a day are immutable:

CREATE TABLE transactions (customer_id INT, txn_time TIMESTAMP, txn_id INT, amount INT);

CREATE TABLE customer_info (customer_id INT, location TEXT);

CREATE DYNAMIC TABLE location_statistics TARGET_LAG='1 minute' WAREHOUSE=my_wh
IMMUTABLE WHERE (hour < CURRENT_TIMESTAMP() - INTERVAL '1 day')
AS
SELECT location, DATE_TRUNC(HOUR, txn_time) as hour, SUM(amount) as total_amount
FROM transactions JOIN customer_info USING (customer_id)
GROUP BY location, hour;

If customer_info is updated, we don’t need to go back and recompute old aggregates, which can significantly reduce the cost of refreshing the Dynamic Table. Additionally, we can now safely delete old data from the transactions table if we have the following Task defined as an example:

CREATE TASK delete_old_txns WAREHOUSE=my_wh SCHEDULE='12 hours' AS
DELETE FROM transactions WHERE txn_time < CURRENT_TIMESTAMP() - INTERVAL '1 week';

We no longer will need to pay for storage of the old raw data but can still maintain the historical aggregates in the Dynamic Table.

Building on this concept, Dynamic Tables with IMMUTABLE WHERE specified can also be backfilled from an existing table. This unlocks two important scenarios: migrating existing pipelines to Dynamic Tables and changing definitions of Dynamic Tables without expensive recomputation.

For the first scenario, let’s say we have an existing table containing historical aggregates that we want to include in our location_statistics Dynamic Table, but we no longer have the corresponding raw transaction data.  We can create our Dynamic Table with the following:

CREATE DYNAMIC TABLE location_statistics TARGET_LAG='1 minute' WAREHOUSE=my_wh
IMMUTABLE WHERE (hour < CURRENT_TIMESTAMP() - INTERVAL '1 day')
BACKFILL FROM historical_statistics
AS
SELECT location, DATE_TRUNC(HOUR, txn_time) as hour, SUM(amount) as total_amount, AVG(amount) as avg_amount
FROM
 transactions JOIN customer_info USING (customer_id)
GROUP BY location, hour;

For the second scenario, we want to start computing average transaction amounts in addition to sums.  Without immutability and backfill, we would have had to create an entirely new Dynamic Table. With these new features, we can instead copy forward the data:

-- Clone our dynamic table into a new regular table
CREATE TABLE cloned_location_statistics CLONE location_statistics;

-- Add the new column that we want
ALTER TABLE cloned_location_statistics ADD COLUMN avg_amount FLOAT;

-- If we need to do any updating, such as manually backfilling the average for some time range, do that here by mutating `cloned_location_statistics`

-- Update the definition
CREATE OR REPLACE DYNAMIC TABLE location_statistics
TARGET_LAG='1 minute' WAREHOUSE=my_wh
IMMUTABLE WHERE (hour < CURRENT_TIMESTAMP() - INTERVAL '1 day')
BACKFILL FROM cloned_location_statistics
AS
SELECT location,
       DATE_TRUNC(HOUR, txn_time) as hour,
       SUM(amount) as total_amount,
       AVG(amount) as avg_amount
FROM transactions JOIN customer_info USING (customer_id)
GROUP BY location, hour;

In addition to changes to the Dynamic Table, upstream changes can also be made without having to fully reinitialize. In the previous example, without immutability, create-or-replaceing the customer_info table would result in recomputing all of location_statistics, but with immutability, only the mutable region would need to be recomputed.

Finally, immutability can also be used to avoid full recomputation for Dynamic Tables whose definitions can’t be incremental or that don’t perform well incrementally. In the following example, we enrich a timestamped fact table with a large number of dimension tables. This many left outer joins often do not incrementalize well but can still be much cheaper than a complete full refresh by using immutability:

CREATE DYNAMIC TABLE enriched_data TARGET_LAG='12 hours' WAREHOUSE=my_wh
IMMUTABLE WHERE (event_time < CURRENT_TIMESTAMP() - INTERVAL '1 day')
AS SELECT * FROM fact_table f
LEFT OUTER JOIN dim_table1 d1 ON f.fk1 = dt.pk
LEFT OUTER JOIN dim_table2 d1 ON f.fk2 = dt.pk 
LEFT OUTER JOIN dim_table3 d1 ON f.fk3 = dt.pk 
LEFT OUTER JOIN dim_table4 d1 ON f.fk4 = dt.pk 
LEFT OUTER JOIN dim_table5 d1 ON f.fk5 = dt.pk;

With this definition, each refresh only needs to compute the most recent day’s worth of data, rather than the entire contents of the Dynamic Table.

In conclusion, Snowflake Dynamic Tables with immutability constraints offer a powerful and flexible solution for managing data pipelines. Now that users can lock specific portions of their tables, they can reduce refresh and recomputation costs, manage evolution of pipelines and easily migrate existing pipelines to Dynamic Tables.  We are working on further improvements in this space, such as the ability to modify Dynamic Table definitions in place and additional ways to express immutability, so stay tuned for future announcements.

Share Article

Subscribe to our blog newsletter

Get the best, coolest and latest delivered to your inbox each week

Where Data Does More

  • 30-day free trial
  • No credit card required
  • Cancel anytime