Lakehouse Transformations: Build Production Pipelines for your Iceberg Tables

Snowflake for Developers/Guides/Lakehouse Transformations: Build Production Pipelines for your Iceberg Tables

Quickstart

Lakehouse Transformations: Build Production Pipelines for your Iceberg Tables

Kamesh Sampath, Gilberto Hernandez

Overview

This quickstart shows how to build a bronze-to-silver Iceberg pipeline with AWS and Snowflake — without copying data into a second storage system. You prepare a bronze Iceberg landing zone in AWS (Glue catalog, S3 warehouse, and optional S3 Tables), connect Snowflake to the same catalog using Catalog Linked Databases, build Snowflake-managed Dynamic Iceberg Tables that refresh into silver storage you control, and visualize results using Streamlit in Snowflake. A final chapter queries the same silver tables from DuckDB via Snowflake's Horizon Iceberg REST Catalog.

The guide is bronze-first: each layer is verified before the next one starts, so any failure is easy to isolate.

What You'll Learn

How to prepare a bronze layer in AWS using Glue, S3, Lake Formation, and task-driven automation
How Snowflake uses a catalog integration and Catalog Linked Databases to query externally managed Iceberg metadata without ETL duplication
How Dynamic Iceberg Tables transform bronze JSON into production-ready silver aggregates while preserving Iceberg format and multi-engine access
How to make your silver data AI-ready with a Semantic View for natural-language querying via Snowflake Intelligence
How to build a live Streamlit in Snowflake dashboard over silver Dynamic Tables
How to query Snowflake-managed Iceberg tables from DuckDB via the Horizon Iceberg REST Catalog

What You'll Build

A repeatable lakehouse workflow: bronze Iceberg tables loaded in AWS, consumed and transformed in Snowflake via a Catalog Linked Database and Dynamic Iceberg Tables, surfaced as a Streamlit in Snowflake dashboard, and queried from DuckDB. Every layer is auditable and every file stays in open Iceberg format.

Prerequisites

Access to a Snowflake account
Access to an AWS account with permissions for Glue, S3, Lake Formation, and IAM
Local workstation with required CLIs — see Local Toolchain in the next section
A configured Snowflake CLI connection; verify with snow connection test before any snow sql steps (Snowflake CLI installation)

Use Case and Architecture

The Balloon Game

The lab uses a balloon-popping game as the sample workload. A Python generator simulates players popping balloons of different colors, producing a stream of game events. Each event is a JSON object with these fields:

Field	Type	Description
player	string	Player identifier
balloon_color	string	Color of the popped balloon
score	integer	Points scored for this pop
page_id	string	Game page where the pop occurred
favorite_color_bonus	boolean	Whether a scoring bonus was applied
event_ts	timestamp	Event time

Events land as raw JSON strings in a single event column in the bronze Iceberg table balloon_game_events. The silver layer uses PARSE_JSON to project and aggregate these fields into five production-ready tables.

Architecture

Lab Layers

Layer	Technology	What it does
Bronze	Glue + S3 + PyIceberg	Loads raw game events as Iceberg in AWS
Catalog	Snowflake Catalog Integration	Connects Snowflake to Glue Iceberg REST with SigV4 + LF vended credentials
CLD	Catalog-Linked Database	Mirrors Glue namespaces and tables as Snowflake schemas — no data copy
Silver	Dynamic Iceberg Tables	Transforms JSON bronze into 5 aggregation tables; writes Iceberg back to S3
AI-Ready	Snowflake Intelligence	Semantic View over silver DTs enables natural-language querying via Cortex Analyst
Dashboard	Streamlit in Snowflake	Live dashboard over silver DTs; zero local server
Cross-engine	DuckDB via HIRC	Queries silver Iceberg tables through Snowflake's Horizon REST Catalog

Tools and Prerequisites

Clone Repository

This quickstart's narrative lives on Snowflake Quickstarts; all automation lives in the companion repository. Clone it and use the repo root as your working directory for every task, uv run, and path reference.

git clone https://github.com/Snowflake-Labs/sfguide-lakehouse-iceberg-production-pipelines.git

cd sfguide-lakehouse-iceberg-production-pipelines

Repository: Snowflake-Labs/sfguide-lakehouse-iceberg-production-pipelines

Accounts and Permissions

AWS account with a named profile (AWS_PROFILE) that can create and update Glue databases, manage IAM roles, and access S3
Snowflake account with ACCOUNTADMIN or a role with CREATE INTEGRATION, CREATE DATABASE, and CREATE STREAMLIT privileges
Snowflake CLI connection configured for that account — snow connection list and snow connection test both succeed

Required Tools

This repo targets Python 3.12+. uv manages the interpreter and all dependencies.

Tool	Role	macOS	Linux (Debian/Ubuntu)	Windows
Git	Clone the companion repository	brew install git	sudo apt install git	Git for Windows
uv	Python deps and uv run entrypoints	brew install uv	Astral installer	PowerShell installer
Task	task bronze:, task check-tools*	brew install go-task	Install script	scoop install task
AWS CLI v2	Glue, S3, STS; S3 Tables needs v2.34+	brew install awscli	AWS bundled installer	AWS MSI
Snowflake CLI	Snowflake steps; also available via uv sync	Snowflake CLI docs	Snowflake CLI docs	Snowflake CLI docs
envsubst	Renders IAM policy templates (gettext package)	brew install gettext	sudo apt install gettext-base	WSL2 recommended
jq	JSON checks at the shell	brew install jq	sudo apt install jq	scoop install jq

After uv sync, use uv run snow … from the repo root, or add .venv/bin (macOS/Linux) or .venv\Scripts (Windows) to your PATH.

Windows note: If task check-tools fails only on envsubst, use WSL2 or run uv run bronze-cli render-iam (the Python path) instead.

Recommended Tools

Tool	Why	macOS	Linux (Debian/Ubuntu)	Windows
direnv	Auto-loads .env when you cd into the repo	brew install direnv	sudo apt install direnv	WSL2
curl	Scripts and health checks	pre-installed	pre-installed	curl.se
openssl	TLS and common crypto one-liners	pre-installed	pre-installed	OpenSSL binaries

Verify Installation

Run the one-time bootstrap to install Python deps and create .env from the template (if it does not exist yet):

task setup

Configure your AWS profile and run the prerequisite check. The check will report errors for any missing required binaries, provide warnings for recommended (but not required) tools, and use aws sts get-caller-identity to validate your AWS session. Address any missing tools or credential issues, then rerun the check until you see All required tools are available.

export AWS_PROFILE=your-profile
task check-tools

Environment Inputs

task setup copies .env.example to .env on first run. Edit .env and fill in your values.

The .env.example is organized by lab phase. Key sections and variables:

# =============================================================
# Phase 1 — AWS (bronze landing)
# =============================================================
AWS_PROFILE=                    # AWS named profile
AWS_REGION=                     # e.g. us-west-2

# Shared workshop: set this for per-participant bucket/database name derivation.
# Leave BRONZE_BUCKET_NAME empty when LAB_USERNAME is set.
# LAB_USERNAME=

# =============================================================
# Phase 1 — Iceberg warehouse (Glue + PyIceberg)
# =============================================================
BRONZE_BUCKET_NAME=             # Globally unique S3 bucket name
# GLUE_DATABASE=balloon_pops    # Derived from LAB_USERNAME when unset
# BRONZE_LOAD_DURATION_MINUTES=5

# =============================================================
# Phase 1 — Amazon S3 Tables (optional, AWS CLI 2.34+)
# =============================================================
BRONZE_S3TABLES_BUCKET_NAME=
S3TABLES_NAMESPACE=balloon_pops

# =============================================================
# Phase 2 — Snowflake CLI / SQL
# =============================================================
# SNOWFLAKE_DEFAULT_CONNECTION_NAME=devrel-ent
# SNOWFLAKE_ROLE=ACCOUNTADMIN
# SNOWFLAKE_WAREHOUSE=COMPUTE_WH

# =============================================================
# Phase 2 — Dynamic Iceberg Tables (task dt:generate-sql)
# =============================================================
# SNOWFLAKE_SILVER_DATABASE=balloon_silver
# SNOWFLAKE_SILVER_SCHEMA=silver

Key variables:

Variable	Phase	Default	Notes
AWS_PROFILE	1	required	AWS named profile for all bronze tasks
AWS_REGION	1	required	Keeps all API calls in one region
LAB_USERNAME	1	none	Workshop shared accounts — drives bucket/database name derivation
BRONZE_BUCKET_NAME	1	derived	S3 warehouse bucket; iceberg/ becomes the Glue warehouse URI
SNOWFLAKE_DEFAULT_CONNECTION_NAME	2	snow default	Override when using a non-default snow connection
SNOWFLAKE_ROLE	2	ACCOUNTADMIN	Role for catalog integration and CLD commands
SNOWFLAKE_SILVER_DATABASE	2	balloon_silver	Native Snowflake database for DT objects
SNOWFLAKE_SILVER_SCHEMA	2	silver	Schema for silver Dynamic Iceberg Tables
SNOWFLAKE_APPS_SCHEMA	2	apps	Schema for Streamlit app deployment (SiS chapter)

Bronze Landing Zone

This is the first hands-on chapter. All downstream Snowflake steps assume the bronze Iceberg tables exist in AWS Glue and that the bronze ARNs and Glue metadata are ready.

Set Up and Load

Preview any setup step without making changes (optional but recommended first):

task bronze:render-iam-dry-run
task bronze:glue-setup-dry-run
task bronze:s3tables-setup-dry-run

Render the IAM policy template (optional — run first if attaching a new IAM role):

task bronze:render-iam

Create the Glue database and register the S3 warehouse:

task bronze:glue-setup

Provision the S3 Tables control plane (optional — requires AWS CLI 2.34+):

task bronze:s3tables-setup

Load sample balloon game events into the Glue Iceberg table:

task bronze:load

Or run all three setup steps and load in one shot:

task bronze:all

What Gets Created

Glue database	Table	Schema
GLUE_DATABASE (e.g. ksampath_balloon_pops)	balloon_game_events	event — STRING, one JSON object per row

Each JSON object contains: player, balloon_color, score, page_id, favorite_color_bonus, event_ts. Snowflake Dynamic Iceberg Tables use PARSE_JSON and variant paths to project these fields into typed columns.

Print a copy-paste sheet of ARNs and exports needed for Snowflake catalog integration SQL:

task bronze:snowflake-summary

Lake Formation Setup

After task bronze:load and after completing step 1 of the Snowflake CLD chapter (task snowflake:create-glue-catalog-read-role), configure Lake Formation for vended credentials.

Preview the Lake Formation setup without any AWS writes:

task bronze:lakeformation-setup-dry-run

Apply Lake Formation grants:

task bronze:lakeformation-setup

This step registers BRONZE_BUCKET_NAME with Lake Formation using a dedicated data-access IAM role (HybridAccessEnabled=false, WithFederation=false), clears the default Glue IAM-only table permissions on GLUE_DATABASE, and grants SELECT and DESCRIBE to your Snowflake SIGV4 role.

Keep the SIGV4 and LF data-access roles separate. Using the same role causes credential vending errors — see the Troubleshooting chapter for error code 094120.

Verify in AWS Console

Use the same AWS account and AWS_REGION as your CLI profile.

Glue Data Catalog:

Open Glue → Data catalog → Databases and confirm GLUE_DATABASE exists.
Open that database → Tables → confirm balloon_game_events is listed.
Open balloon_game_events and confirm Apache Iceberg as the table format.

Glue — database details showing Location URI

Glue — tables list showing balloon_game_events

Glue — Iceberg table detail for balloon_game_events

S3 Warehouse:

Open S3 → Buckets → BRONZE_BUCKET_NAME.
Open the iceberg/ prefix — expect metadata/ and data/ style keys after load.

S3 Tables (optional):

Open S3 Tables → Table buckets and confirm BRONZE_S3TABLES_BUCKET_NAME appears.

Optional: Query in Athena

Use data source AwsDataCatalog, database GLUE_DATABASE, and table balloon_game_events. Do not select the s3tables/<table-bucket> federated catalog entry — that path is an empty shell until a separate writer commits metadata.

Snowflake Catalog Linked Database (CLD)

This chapter creates the Glue Iceberg REST catalog integration, tightens IAM trust, creates the catalog linked database (CLD), and runs discovery and read queries against balloon_game_events.

Before starting:

Bronze tables are loaded; .aws-config/glue-database.json was written by task bronze:glue-setup
snow connection test succeeds
Lake Formation is configured for VENDED_CREDENTIALS (see Lake Formation Setup in the Bronze chapter)

Easy Path — Interactive Notebook

Open cld_lab_guide.ipynb in Snowflake Notebooks for an interactive walkthrough. The notebook uses:

ENABLED = FALSE on the catalog integration to break the IAM chicken-egg dependency — trust policy values are generated without connecting to Glue
A companion CloudFormation template (cfn-snowflake-cld.yaml) that deploys all IAM roles, Lake Formation registration, and permissions in a single stack
Cortex Code prompts embedded in each step — use Cmd+K or the chat sidebar to generate SQL and CLI commands from natural language
SYSTEM$VERIFY_CATALOG_INTEGRATION to validate connectivity before creating the CLD

Follow the Detailed Path below for step-by-step shell commands.

Detailed Path

Role requirement: The commands in this chapter require ACCOUNTADMIN or a role with CREATE INTEGRATION, CREATE DATABASE, and GRANT privileges. The lab defaults to SNOWFLAKE_ROLE = ACCOUNTADMIN set in .env. Confirm this before running any snow sql commands.

Create IAM Role

Create the Snowflake SIGV4 IAM role that Snowflake uses to sign Glue REST requests:

task snowflake:create-glue-catalog-read-role

This writes the IAM role ARN to .aws-config/snowflake-glue-catalog-iam-role-arn.txt. All subsequent lab tools read it automatically — no env var override needed.

After this step, return to the Bronze chapter and run task bronze:lakeformation-setup to grant the SIGV4 role access via Lake Formation before proceeding.

Notebook path (CloudFormation): The interactive notebook uses a single CloudFormation template instead of individual task commands for IAM and Lake Formation setup. It creates the catalog integration as ENABLED = FALSE first, extracts trust values from DESC INTEGRATION, and passes them as stack parameters — eliminating the manual trust-render-apply cycle. See the notebook for details.

Generate SQL

Generate runnable SQL from your .aws-config/ artifacts:

task snowflake:generate-lab-sql

This writes two files to snowflake/lab/generated/:

01_catalog_integration.generated.sql — CREATE CATALOG INTEGRATION
02_cld_verify.generated.sql — CREATE DATABASE + SHOW + SELECT

To preview the SQL without writing files:

task snowflake:generate-lab-sql-stdout

Create Catalog Integration

Apply the generated catalog integration SQL:

snow sql --filename snowflake/lab/generated/01_catalog_integration.generated.sql

The generated SQL creates glue_rest_catalog_int (default name) with these settings:

CATALOG_SOURCE = ICEBERG_REST, TABLE_FORMAT = ICEBERG
CATALOG_URI = https://glue.\<region>.amazonaws.com/iceberg
CATALOG_API_TYPE = AWS_GLUE, ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS
CATALOG_NAME = your 12-digit AWS account ID (Glue Data Catalog default)
CATALOG_NAMESPACE = GLUE_DATABASE
SIGV4_IAM_ROLE = ARN from .aws-config/snowflake-glue-catalog-iam-role-arn.txt

Tip: Breaking the chicken-egg dependency. The notebook creates the catalog integration with ENABLED = FALSE. This generates API_AWS_IAM_USER_ARN and API_AWS_EXTERNAL_ID immediately without requiring the IAM role to exist yet. After deploying the CloudFormation stack with the real trust values, the integration is enabled via ALTER CATALOG INTEGRATION … SET ENABLED = TRUE. This avoids a two-pass trust policy update.

Describe and Capture Trust Fields

Print catalog integration properties including the Snowflake-generated trust fields:

task snowflake:describe-catalog-integration

Or run the SQL directly:

DESC CATALOG INTEGRATION glue_rest_catalog_int;

Note API_AWS_IAM_USER_ARN and API_AWS_EXTERNAL_ID from the output — these are needed to tighten the trust policy on the SIGV4 IAM role.

Apply IAM Trust

Render the trust document using the DESC output:

task snowflake:render-glue-catalog-trust

Apply the rendered trust policy to the SIGV4 IAM role:

task snowflake:apply-glue-catalog-trust-from-rendered

This updates the IAM role's trust policy to scope access to Snowflake's specific IAM user ARN and external ID. Alternatively, paste the rendered JSON from .aws-config/snowflake-glue-catalog-trust-policy.rendered.json directly in the IAM console under Trust relationships.

Verify Catalog Integration

After applying the trust policy, verify the integration can connect to Glue:

SELECT SYSTEM$VERIFY_CATALOG_INTEGRATION('glue_rest_catalog_int');

Expect "success": true in the JSON response. If it fails, check that the trust policy values match DESC INTEGRATION output exactly and that IAM propagation is complete (up to 30 seconds).

Create Catalog-Linked Database

Apply the generated CLD and verify script:

snow sql --filename snowflake/lab/generated/02_cld_verify.generated.sql

This creates balloon_game_events as a Catalog-Linked Database and runs initial discovery. To create it manually instead:

CREATE OR REPLACE DATABASE balloon_game_events
  COMMENT = 'CLD: Glue bronze Iceberg'
  LINKED_CATALOG = (
    CATALOG = 'glue_rest_catalog_int'
  );

Note: Any change to Lake Formation permissions or IAM trust after a link failure requires re-creating the CLD. After fixing the LF/IAM settings, re-run CREATE OR REPLACE DATABASE … LINKED_CATALOG = (…) and re-apply GRANT USAGE ON INTEGRATION glue_rest_catalog_int TO ROLE <your_role>. See Privileges Lost After CLD Recreate in Troubleshooting.

Optional status checks:

SELECT SYSTEM$CATALOG_LINK_STATUS('balloon_game_events');

SELECT SYSTEM$GET_CATALOG_LINKED_DATABASE_CONFIG('balloon_game_events');

Discover and Query

List remote namespaces discovered from Glue:

SHOW SCHEMAS IN DATABASE balloon_game_events;

List Iceberg tables in the discovered namespace:

-- Replace <remote_schema> with your GLUE_DATABASE name in lowercase
SHOW ICEBERG TABLES IN SCHEMA balloon_game_events."<remote_schema>";

Read raw events and project fields using PARSE_JSON:

SELECT
  PARSE_JSON(event):player::STRING         AS player,
  PARSE_JSON(event):balloon_color::STRING  AS balloon_color,
  PARSE_JSON(event):score::INTEGER         AS score,
  PARSE_JSON(event):event_ts::TIMESTAMP_TZ AS event_ts
FROM balloon_game_events."<remote_schema>"."balloon_game_events"
LIMIT 10;

Lake Formation Console Checks

When to check: Run these checks only if SYSTEM$CATALOG_LINK_STATUS('balloon_game_events') reports failures. A healthy link returns:

{"failureDetails":[],"executionState":"RUNNING","lastLinkAttemptStartTime":"..."}

If task bronze:lakeformation-setup ran successfully, skip this. Otherwise verify these four settings manually.

Notebook users: The interactive notebook includes the same four checks with annotated screenshots in Step 2c: Verify Lake Formation Settings.

1. Database mode:

Open Lake Formation → Data catalog → Databases → open GLUE_DATABASE → Edit. Confirm Use only IAM access control for new tables is unchecked.

Lake Formation — edit database, Lake Formation mode for new tables

2. Data lake location:

Open Permissions → Data lake locations. Confirm s3://<BRONZE_BUCKET_NAME>/iceberg/ is registered with HybridAccessEnabled=false and WithFederation=false using a dedicated LF data-access role that is different from the SIGV4 role.

Lake Formation — data lake storage settings

3. Data permissions:

Open Permissions → Data permissions. Confirm the SIGV4 role has DESCRIBE on the database and SELECT, DESCRIBE on the table wildcard.

Lake Formation — data permissions for the Snowflake catalog role

4. Application integration settings:

Open Lake Formation → Administration → Application integration settings. Confirm Allow external engines to access data in Amazon S3 locations with full table access is enabled. This setting is mandatory for Snowflake's vended-credentials flow — without it, Lake Formation will not issue temporary credentials to the SIGV4 role.

Lake Formation — application integration settings

Full Reference Sequence

End-to-end Snowflake command sequence (assumes tools verified and bronze loaded):

Command	What it does
task bronze:snowflake-summary	Print bucket/DB/ARNs needed for Snowflake catalog SQL
task snowflake:create-glue-catalog-read-role	Create SIGV4 IAM role; write ARN to .aws-config/
(return to Bronze) task bronze:lakeformation-setup	Grant SIGV4 role access via Lake Formation
task snowflake:generate-lab-sql	Generate 01_catalog_integration and 02_cld_verify SQL
snow sql --filename snowflake/lab/generated/01_catalog_integration.generated.sql	Create the catalog integration in Snowflake
task snowflake:describe-catalog-integration	Print trust fields (API_AWS_IAM_USER_ARN + external ID)
task snowflake:render-glue-catalog-trust	Render trust JSON using those fields
task snowflake:apply-glue-catalog-trust-from-rendered	Apply rendered trust to SIGV4 IAM role
SELECT SYSTEM$VERIFY_CATALOG_INTEGRATION('glue_rest_catalog_int')	Verify integration connects to Glue (expect `"success": true`)
snow sql --filename snowflake/lab/generated/02_cld_verify.generated.sql	Create CLD and run discovery queries

Notebook alternative (ENABLED=FALSE flow):

Step Command What it does
2a CREATE CATALOG INTEGRATION … ENABLED = FALSE Generate trust values without connecting to Glue
2b DESC INTEGRATION → aws cloudformation deploy Extract trust values and deploy IAM + LF via CFN
2c Verify Lake Formation settings in AWS Console Confirm CFN deployed correctly
2d ALTER CATALOG INTEGRATION … SET ENABLED = TRUE Enable and verify with SYSTEM$VERIFY_CATALOG_INTEGRATION

Step	Command	What it does
2a	CREATE CATALOG INTEGRATION … ENABLED = FALSE	Generate trust values without connecting to Glue
2b	DESC INTEGRATION → aws cloudformation deploy	Extract trust values and deploy IAM + LF via CFN
2c	Verify Lake Formation settings in AWS Console	Confirm CFN deployed correctly
2d	ALTER CATALOG INTEGRATION … SET ENABLED = TRUE	Enable and verify with SYSTEM$VERIFY_CATALOG_INTEGRATION

Dynamic Iceberg Tables

With bronze readable through the CLD, add Dynamic Iceberg Tables that write silver Iceberg data. It uses Snowflake Managed Storage for Iceberg. The silver pipeline frequency is controlled on a declared TARGET_LAG. Five aggregation tables refresh automatically and remain readable by any Iceberg-compatible engine.

Five Silver Tables

Table	What it aggregates
dt_player_leaderboard	Per-player total score, bonus pops, last event
dt_balloon_color_stats	Per-player, per-color breakdown (pops, points, bonuses)
dt_realtime_scores	15-second windowed scores per player
dt_balloon_colored_pops	15-second windows by player and balloon color
dt_color_performance_trends	Average score per pop by color over 15-second windows

Easy Path — Interactive Notebook

Open dt_lab_guide.ipynb in Snowflake Notebooks for an interactive walkthrough. The notebook builds on the CLD Lab Guide and includes:

CLD recap — summarizes what was built in the previous chapter before diving into silver
"Why Dynamic Iceberg Tables?" — callout explaining DIT vs regular Dynamic Tables, TARGET_LAG, and Iceberg interoperability
Cortex Code prompts — generate the first DT SQL via Cmd+K, check refresh status via chat, and explore silver data with intent-driven prompts
Teachable moment — the Cortex Code prompt for dt_player_leaderboard may generate TIMESTAMP_TZ, which Iceberg doesn't support — workshoppers debug and fix it live
"What Just Happened?" — architecture summary of the bronze→silver pipeline
Cleanup — single DROP DATABASE to tear down all DTs

Use the Detailed Path below for step-by-step shell commands.

Detailed Path

Configure Environment

Set these variables in .env before generating SQL:

Variable	Default	Notes
SNOWFLAKE_SILVER_DATABASE	balloon_silver	Native Snowflake database for DT objects
SNOWFLAKE_SILVER_SCHEMA	silver	Schema for all silver DTs
SNOWFLAKE_WAREHOUSE	COMPUTE_WH	Warehouse for DT refresh compute

Print current environment hints:

task snowflake:print-env-hints

Dynamic Iceberg Tables use Snowflake Managed Storage — no external volume setup is required.

Iceberg type limitation: Iceberg tables do not support TIMESTAMP_TZ. Use TIMESTAMP_LTZ (maps to Iceberg timestamptz) or TIMESTAMP_NTZ (maps to Iceberg timestamp). If generated SQL uses TIMESTAMP_TZ, change it to TIMESTAMP_LTZ(6) or TIMESTAMP_NTZ.

Generate and Apply DT SQL

Generate the silver DT SQL from your env and .aws-config/ artifacts:

task dt:generate-sql

This writes snowflake/lab/generated/03_dt_pipelines.generated.sql.

Apply the generated SQL:

snow sql --filename snowflake/lab/generated/03_dt_pipelines.generated.sql

The unedited scaffold for manual editing lives at snowflake/lab/03_dt_pipelines.sql.

Verify

Check DT status after creation:

USE DATABASE balloon_silver;
USE SCHEMA silver;
SHOW DYNAMIC TABLES LIKE 'dt_%' IN SCHEMA;

Wait for an initial refresh (check Snowsight → Data → Dynamic Tables, or inspect SCHEDULING_STATE in the SHOW output), then query:

-- Top players by score
SELECT player, total_score, bonus_pops, last_event_ts
FROM balloon_silver.silver.dt_player_leaderboard
ORDER BY total_score DESC NULLS LAST
LIMIT 15;

-- 15-second windowed scores
SELECT player, total_score, window_start, window_end
FROM balloon_silver.silver.dt_realtime_scores
ORDER BY window_start DESC, player
LIMIT 20;

Run all verification queries at once:

snow sql --filename snowflake/lab/04_dt_verify_sample_queries.sql

Quick Teardown

To remove all silver DTs and their refresh schedules:

DROP DATABASE IF EXISTS balloon_silver;

DT refreshes stop automatically when the database is dropped. The external volume and S3 data are not deleted — remove those separately if needed.

Snowflake Intelligence

Your silver Dynamic Iceberg Tables are now AI-ready. By creating a Semantic View over the five silver tables, you enable natural-language querying via Snowflake Intelligence and the Cortex Analyst API — no additional ETL, no model training, no external tools.

A Semantic View defines the business meaning of your tables, columns, and metrics in YAML. Once created, users (and AI agents) can ask questions like:

"Who is the top-scoring player?"
"Which balloon color gives the highest average points?"
"Show me score trends over the last hour"

…and get accurate SQL-backed answers grounded in your silver data.

Easy Path — Interactive Notebook

Open si_lab_guide.ipynb in Snowflake Notebooks for a guided walkthrough. The notebook uses the same environment variables as the DT Lab Guide and includes a Cortex Code prompt to auto-generate the Semantic View, plus step-by-step instructions for configuring the Intelligence agent.

Detailed Path

Create the Semantic View

Create a Semantic View that describes all five silver tables with their business context. Replace balloon_silver with your SNOWFLAKE_SILVER_DATABASE if different:

CREATE OR REPLACE SEMANTIC VIEW summit26_ar103_balloon_silver.silver.balloon_game_semantic_view

  TABLES (
    player_leaderboard AS summit26_ar103_balloon_silver.silver.dt_player_leaderboard
      PRIMARY KEY (player)
      WITH SYNONYMS ('leaderboard', 'rankings', 'top players')
      COMMENT = 'Aggregated player scores: total score, bonus pops, last event timestamp per player',

    color_stats AS summit26_ar103_balloon_silver.silver.dt_balloon_color_stats
      UNIQUE (player, balloon_color)
      WITH SYNONYMS ('color breakdown', 'player colors', 'color scores')
      COMMENT = 'Per-player breakdown by balloon color: pops, points, and bonus hits',

    realtime_scores AS summit26_ar103_balloon_silver.silver.dt_realtime_scores
      WITH SYNONYMS ('live scores', 'recent scores', 'hot streaks')
      COMMENT = '15-second windowed score totals per player for time-series analysis',

    colored_pops AS summit26_ar103_balloon_silver.silver.dt_balloon_colored_pops
      WITH SYNONYMS ('detailed pops', 'player color windows')
      COMMENT = 'Most granular view: per-player, per-color pops in 15-second time windows',

    color_trends AS summit26_ar103_balloon_silver.silver.dt_color_performance_trends
      WITH SYNONYMS ('color trends', 'color performance', 'best colors')
      COMMENT = 'Average score per pop and total pops by balloon color over 15-second windows'
  )

  RELATIONSHIPS (
    color_stats_to_leaderboard AS
      color_stats (player) REFERENCES player_leaderboard,
    realtime_to_leaderboard AS
      realtime_scores (player) REFERENCES player_leaderboard,
    colored_pops_to_leaderboard AS
      colored_pops (player) REFERENCES player_leaderboard,
    colored_pops_to_color_stats AS
      colored_pops (player, balloon_color) REFERENCES color_stats
  )

  FACTS (
    player_leaderboard.total_score AS player_leaderboard.total_score
      WITH SYNONYMS = ('total score', 'overall score', 'total points')
      COMMENT = 'Cumulative score across all balloon pops',
    player_leaderboard.bonus_pops AS player_leaderboard.bonus_pops
      WITH SYNONYMS = ('bonus pops', 'bonuses', 'bonus count')
      COMMENT = 'Number of pops where player hit their favorite color',
    color_stats.balloon_pops AS color_stats.balloon_pops
      WITH SYNONYMS = ('pops', 'pop count', 'times popped')
      COMMENT = 'Number of times this player popped this color',
    color_stats.points_by_color AS color_stats.points_by_color
      WITH SYNONYMS = ('points by color', 'color points', 'color score')
      COMMENT = 'Total points earned from popping this color',
    color_stats.bonus_hits AS color_stats.bonus_hits
      WITH SYNONYMS = ('bonus hits', 'color bonuses')
      COMMENT = 'Number of favorite-color bonus pops for this color',
    realtime_scores.window_score AS realtime_scores.total_score
      WITH SYNONYMS = ('window score', 'live score', 'current score')
      COMMENT = 'Sum of scores within the 15-second window',
    colored_pops.window_pops AS colored_pops.balloon_pops
      COMMENT = 'Pop count for this player+color in this window',
    colored_pops.window_points AS colored_pops.points_by_color
      COMMENT = 'Points for this player+color in this window',
    colored_pops.window_bonus AS colored_pops.bonus_hits
      COMMENT = 'Bonus pops for this player+color in this window',
    color_trends.avg_score_per_pop AS color_trends.avg_score_per_pop
      WITH SYNONYMS = ('efficiency', 'points per pop', 'scoring rate', 'best value')
      COMMENT = 'Average points earned per pop of this color in this window',
    color_trends.total_pops AS color_trends.total_pops
      WITH SYNONYMS = ('volume', 'popularity', 'total pops')
      COMMENT = 'Total pops of this color in this window'
  )

  DIMENSIONS (
    player_leaderboard.player_name AS player_leaderboard.player
      WITH SYNONYMS = ('player', 'gamer', 'username', 'who')
      COMMENT = 'Unique player identifier',
    player_leaderboard.last_active AS player_leaderboard.last_event_ts
      WITH SYNONYMS = ('last active', 'last seen', 'last played')
      COMMENT = 'Timestamp of the most recent game event for this player',
    color_stats.cs_player AS color_stats.player
      COMMENT = 'Player identifier in color stats',
    color_stats.color AS color_stats.balloon_color
      WITH SYNONYMS = ('balloon color', 'color', 'balloon type')
      COMMENT = 'Color of the balloon (red, blue, green, yellow, etc.)',
    color_stats.cs_last_event AS color_stats.last_event_ts
      COMMENT = 'Most recent pop of this color by this player',
    realtime_scores.rs_player AS realtime_scores.player
      COMMENT = 'Player identifier in realtime scores',
    realtime_scores.window_start AS realtime_scores.window_start
      WITH SYNONYMS = ('start time', 'window start')
      COMMENT = 'Start of the 15-second time window',
    realtime_scores.window_end AS realtime_scores.window_end
      WITH SYNONYMS = ('end time', 'window end')
      COMMENT = 'End of the 15-second time window',
    colored_pops.cp_player AS colored_pops.player
      COMMENT = 'Player identifier in colored pops',
    colored_pops.cp_color AS colored_pops.balloon_color
      COMMENT = 'Balloon color in the detailed window view',
    colored_pops.cp_window_start AS colored_pops.window_start
      COMMENT = 'Start of the time window',
    colored_pops.cp_window_end AS colored_pops.window_end
      COMMENT = 'End of the time window',
    color_trends.ct_color AS color_trends.balloon_color
      WITH SYNONYMS = ('trending color', 'color trend')
      COMMENT = 'Balloon color for performance trend analysis',
    color_trends.ct_window_start AS color_trends.window_start
      COMMENT = 'Start of the trend analysis window',
    color_trends.ct_window_end AS color_trends.window_end
      COMMENT = 'End of the trend analysis window'
  )

  METRICS (
    player_leaderboard.m_total_score AS SUM(player_leaderboard.total_score)
      WITH SYNONYMS = ('total points', 'combined score')
      COMMENT = 'Total cumulative score across all players',
    player_leaderboard.m_total_bonus_pops AS SUM(player_leaderboard.bonus_pops)
      WITH SYNONYMS = ('bonus total', 'all bonuses')
      COMMENT = 'Total bonus pops across all players',
    player_leaderboard.m_player_count AS COUNT(player_leaderboard.player)
      WITH SYNONYMS = ('number of players', 'how many players')
      COMMENT = 'Count of players on the leaderboard',
    color_stats.m_total_pops_by_color AS SUM(color_stats.balloon_pops)
      WITH SYNONYMS = ('total balloon pops', 'all pops')
      COMMENT = 'Total balloon pops aggregated across players for a given color',
    color_stats.m_total_points_by_color AS SUM(color_stats.points_by_color)
      WITH SYNONYMS = ('color points total', 'total color points')
      COMMENT = 'Total points aggregated across players for a given color',
    color_stats.m_avg_points_per_pop AS AVG(color_stats.points_by_color / NULLIF(color_stats.balloon_pops, 0))
      WITH SYNONYMS = ('efficiency', 'scoring rate', 'points per pop')
      COMMENT = 'Average points per pop across colors',
    realtime_scores.m_max_window_score AS MAX(realtime_scores.window_score)
      WITH SYNONYMS = ('best window', 'peak score', 'hottest moment')
      COMMENT = 'Highest score in any single 15-second window',
    realtime_scores.m_avg_window_score AS AVG(realtime_scores.window_score)
      WITH SYNONYMS = ('average window score', 'typical window')
      COMMENT = 'Average score per 15-second window',
    color_trends.m_avg_efficiency AS AVG(color_trends.avg_score_per_pop)
      WITH SYNONYMS = ('trend efficiency', 'color efficiency')
      COMMENT = 'Weighted average score per pop across time windows',
    color_trends.m_total_pops AS SUM(color_trends.total_pops)
      WITH SYNONYMS = ('color popularity', 'total color pops')
      COMMENT = 'Total balloon pops across all colors and time windows'
  )

  COMMENT = 'AI-ready semantic layer over balloon game silver Dynamic Iceberg Tables'

  AI_SQL_GENERATION 'This is a balloon popping game. Players pop colored balloons to earn points. Some pops are bonus pops worth extra. The leaderboard has overall rankings by total_score. Color stats show which colors each player pops most and points per color. Realtime scores show 15-second windows of activity. Color trends show which balloon colors give the best points-per-pop over time. When asked about the top player, use the leaderboard total_score. When asked which color scores best, use color_trends avg_score_per_pop. When asked who is hot right now, use realtime_scores with the most recent window_start.';

Verify the Semantic View

Check the view was created and inspect its structure:

SHOW SEMANTIC VIEWS IN SCHEMA balloon_silver.silver;

DESC SEMANTIC VIEW balloon_silver.silver.balloon_game_semantic_view;

Configure Snowflake Intelligence with the Semantic View

With the Semantic View created, set up a Snowflake Intelligence agent that uses it as a tool for natural-language querying. This follows the same pattern as the official Getting Started with Snowflake Intelligence guide — but since we already have a Semantic View (not a YAML file), setup is simpler.

Required Privileges

Lab users running as ACCOUNTADMIN already have the necessary permissions — no additional grants are needed. ACCOUNTADMIN inherits all privileges including Cortex AI access and ownership of objects you create.

Production note: In a production environment, you would grant SNOWFLAKE.CORTEX_USER and REFERENCES + SELECT on the Semantic View to consumer roles:

GRANT DATABASE ROLE SNOWFLAKE.CORTEX_USER TO ROLE <consumer_role>;
GRANT REFERENCES, SELECT ON SEMANTIC VIEW balloon_silver.silver.balloon_game_semantic_view
  TO ROLE <consumer_role>;

Create the Agent

1. Navigate to the Agent admin page — In Snowsight, go to AI & ML → Agents. Confirm your role is set to ACCOUNTADMIN (top-right role selector).

2. Create a new agent:

Click + Create agent
Agent object name: balloon_game_agent (internal identifier)
Display name: Balloon Game Analytics (shown to users in the chat UI)
Description: "Ask questions about balloon game player scores, color stats, and performance trends from the silver lakehouse tables."
Click Create agent

3. Add the Cortex Analyst tool (Semantic View):

Select the Tools tab in the agent editor
Find Cortex Analyst and click + Add
Choose Semantic View (not "Semantic model file" — we already have a view, not a YAML on a stage)
Select database: balloon_silver, schema: silver, view: balloon_game_semantic_view
For Description, click Generate with Cortex to auto-generate a tool description from your semantic metadata — or write your own, e.g.: "Queries structured balloon game data including player leaderboards, color stats, real-time scores, and performance trends. Use for any question about players, scores, colors, or time-based patterns."
Set the Warehouse to your lab warehouse (e.g. KAMESH_DEMOS_S or your assigned warehouse)

4. (Optional) Add the Email tool:

If you want the agent to send query results or insights via email (e.g. "Email me the top 5 players leaderboard"), first create the notification integration and stored procedure in your silver database:

-- Notification integration for email delivery
CREATE OR REPLACE NOTIFICATION INTEGRATION email_integration
  TYPE = EMAIL
  ENABLED = TRUE
  DEFAULT_SUBJECT = 'Balloon Game Analytics';

-- Stored procedure that the agent calls to send emails
CREATE OR REPLACE PROCEDURE balloon_silver.silver.send_email(
    recipient_email VARCHAR,
    subject VARCHAR,
    body VARCHAR
)
RETURNS VARCHAR
LANGUAGE SQL
AS
BEGIN
    CALL SYSTEM$SEND_EMAIL(
        'email_integration',
        :recipient_email,
        :subject,
        :body,
        'text/html'
    );
    RETURN 'Email sent successfully to ' || :recipient_email;
END;

-- Grant execute to your role (ACCOUNTADMIN already has it)
GRANT USAGE ON PROCEDURE balloon_silver.silver.send_email(VARCHAR, VARCHAR, VARCHAR)
  TO ROLE ACCOUNTADMIN;

Source: Adapted from the official Snowflake Intelligence setup.

Then, in the agent editor:

In the Tools tab, find Custom Tools and click + Add
Select database: balloon_silver, schema: silver, procedure: send_email
Configure parameter descriptions (these guide the agent on how to use the tool):
- recipient_email: "If the email is not provided, send it to the current user's email address."
- subject: "If subject is not provided, use 'Balloon Game Analytics'."
- body: "If body is not provided, summarize the last question and use that as content for the email."
The agent can now send results via email when prompted

Note: The Email tool is optional for this lab. Skip it if you only want to explore data interactively. Your Snowflake user must have a verified email address for delivery to work.

5. Add sample questions (recommended):

Select the Voice tab (or Instructions tab depending on your Snowsight version)
Under Sample questions, add examples that help users get started:
- "Who are the top 5 players by total score?"
- "Which balloon color gives the best average points per pop?"
- "Show me score trends over the last few time windows"
- "How many total bonus pops have all players earned?"
- "Email me a summary of the top 3 players"

6. Set orchestration instructions:

In the Instructions section, add the following orchestration instruction:
- "Whenever you can answer visually with a chart, always choose to generate a chart even if the user didn't specify to."

7. Save the agent — Click Save in the top-right corner. The agent is now live.

Access the Agent

Once saved, users can access the agent in two ways:

Snowflake Intelligence chat: Navigate to AI & ML → Snowflake Intelligence, select Balloon Game Analytics from the agent picker in the chat bar, and start asking questions
Direct URL: Go to ai.snowflake.com and select the agent

How it works: When a user asks a question, the agent routes it to Cortex Analyst, which reads the Semantic View's table/column descriptions, relationships, and primary keys to generate accurate SQL. The query executes against your silver Dynamic Iceberg Tables and returns results in the chat — no SQL knowledge required.

Try It with Snowflake Intelligence

Once configured, open your Intelligence app and ask natural-language questions about your balloon game data. Intelligence will ground its SQL generation in the semantic context you defined — table relationships, column descriptions, and metric definitions.

Example questions to try:

"Who are the top 5 players by total score?"
"What's the most popular balloon color across all players?"
"Which color gives the best average points per pop?"
"Show me how player scores trend over time windows"
"Which players have the most bonus pops as a percentage of total pops?"
"How many total bonus pops have all players earned?"
"Email me a summary of the top 3 players"

Why this matters: Your silver data was already queryable by SQL users and BI tools. The Semantic View now makes it queryable by anyone — business analysts, executives, or automated agents — using plain English. The same Iceberg data powers dashboards, cross-engine queries (DuckDB), and AI-driven analytics.

SiS Dashboard

After the silver Dynamic Tables are live, deploy a Streamlit in Snowflake app that visualizes the balloon game event data. The app runs entirely in your Snowflake account next to your data.

This chapter offers two paths to the same outcome — choose one (or try both):

Path	How	What you'll learn
Easy Path: Cortex Code	Describe what you want in natural language; AI builds, debugs, and deploys	Intent-driven development — how iterative prompts replace manual coding
Detailed Path: CLI	Write every file and run every CLI command yourself	Full control, traditional workflow

Recommended: Try the Easy Path first. Watch Cortex Code encounter errors (missing packages, unsupported features, permission gaps) and self-heal. Then ask it to generate the "ideal prompt" that would have worked in one shot — that's the core lesson of Intent-Driven Development (IDD).

Prerequisites:

03_dt_pipelines applied and all five dt_* tables exist in balloon_silver.silver

Easy Path: Intent-Driven (Cortex Code)

Open Cortex Code from the Snowsight sidebar. This path has three rounds that demonstrate the progression from vague intent to precise specification.

Round 1 — State Your Intent

Tell the agent what you want, not how to build it.

Copy and paste this into Cortex Code:

First, ask me the following before you start building:

1. What database.schema has my game's Dynamic Tables?
2. Where should the Streamlit app and stage be created?
3. What warehouse should it use?

Then, build me a multi-page Streamlit in Snowflake dashboard for my balloon popper game using the Dynamic Tables from the schema above.

Structure the app in its own directory with a pages/ directory:

- streamlit_app.py — home page with a game summary and key metrics
- pages/1_Leaderboard.py — top 5 player rankings, score gaps, bar chart
- pages/2_Color_Breakdown.py — pops by color, avg points per pop, player-color heatmap, players favorite color
- pages/3_Trends.py — color value over time, player scores over time

I want to answer: 
  * Who is winning and by how much? 
  * Which colors are most popular? 
  * Which color is worth more points? How are scores trending?

Make it auto-refreshing, clean charts, easy to read at a glance. Handle missing data gracefully. Add a color scheme toggle. Use get_active_session(), wide layout, lowercase column names from Snowpark before passing to pandas.

When Cortex Code asks you the interactive questions, answer with:

Dynamic Tables schema: balloon_silver.silver (or your SNOWFLAKE_SILVER_DATABASE.SNOWFLAKE_SILVER_SCHEMA)
App schema: balloon_silver.apps
Warehouse: your SNOWFLAKE_WAREHOUSE value

What to watch for: Cortex Code will likely hit 3–4 errors and fix each one automatically:

Expected error	Root cause	CoCo self-heals by…
st.set_page_config crash	page_icon / page_title unsupported in SiS	Removing unsupported params
Heatmap import error	px.imshow requires scipy (not in Snowflake conda)	Switching to plotly.graph_objects.Heatmap
st.rerun() not found	Default Streamlit version is 1.22; st.rerun needs ≥ 1.27	Pinning streamlit=1.35.0 in environment.yml
Schema does not exist	balloon_silver.apps not yet created	Creating the schema automatically
"No data available" on all pages	App's owner role lacks SELECT on silver DTs	Discovering and granting permissions

That iterative error → fix loop is the demo. Each failure is an intent gap that Cortex Code closes autonomously.

Round 2 — Ask CoCo to Reflect

After the dashboard is running, paste this into Cortex Code:

Look at the errors you encountered and the fixes you made while
building this dashboard. Based on that experience, write me the
precise, optimized prompt that would have built this app correctly
in one shot — no errors, no retries. Include every technical
constraint and detail you had to discover along the way.

Save the prompt Cortex Code generates — that's your baseline prompt. This is the core practice of intent-driven development: start vague, let the agent iterate, then capture the learnings as a precise specification.

Round 3 (Optional) — Prove the Difference

Want to see the contrast? Ask Cortex Code to delete the app, start a fresh CoCo session, paste the optimized prompt from Round 2, and watch it build cleanly in one pass — zero errors, zero retries.

Detailed Path (CLI)

Follow this path for a traditional, manual deployment using the Snowflake CLI.

Additional prerequisites for Detailed Path:

Snowflake CLI 3.14+ installed (via uv sync, then use uv run snow)
Your role has SELECT on balloon_silver.silver.* and CREATE STREAMLIT on balloon_silver.apps

Prepare the APPS Schema

Create the schema and grant access before deploying:

CREATE SCHEMA IF NOT EXISTS balloon_silver.apps;

If your Streamlit app will run under a role other than ACCOUNTADMIN (e.g. a personal lab role), grant access to the silver data:

GRANT USAGE ON DATABASE balloon_silver TO ROLE <your_role>;
GRANT USAGE ON SCHEMA balloon_silver.silver TO ROLE <your_role>;
GRANT SELECT ON ALL DYNAMIC TABLES IN SCHEMA balloon_silver.silver TO ROLE <your_role>;
GRANT ALL ON SCHEMA balloon_silver.apps TO ROLE <your_role>;

App defaults (from snowflake/sis/snowflake.yml):

Setting	Value
Streamlit object location	balloon_silver.apps
Silver data source	balloon_silver.silver
Query warehouse	COMPUTE_WH (override via SNOWFLAKE_WAREHOUSE)
Deploy task	task snowflake:sis-deploy

The snowflake/sis/snowflake.yml is the ground truth for deployment defaults. The app's schema (SNOWFLAKE_APPS_SCHEMA = apps) is separate from the silver data schema (SNOWFLAKE_SILVER_SCHEMA = silver).

Deploy the App

task snowflake:sis-deploy creates the target schema, stage, and Streamlit object automatically. Deploy using the task wrapper (recommended — reads LAB_USERNAME, SNOWFLAKE_APPS_SCHEMA, SNOWFLAKE_SILVER_DATABASE, SNOWFLAKE_SILVER_SCHEMA, and SNOWFLAKE_WAREHOUSE from .env):

task snowflake:sis-deploy -- --open

Adding --open launches the app in a browser immediately after deploy.

Alternatively, deploy directly with Snowflake CLI:

snow streamlit deploy balloon_game_dashboard --project snowflake/sis --replace

Preview the resolved deploy config without deploying:

uv run sis-deploy show-config

Open and Share

Print the deployed app URL:

task snowflake:url

Open the deployed app in Snowsight. Grant access to analyst roles:

GRANT USAGE ON STREAMLIT balloon_silver.apps.balloon_game_dashboard TO ROLE <analyst_role>;

If the account requires it, promote the live version:

ALTER STREAMLIT balloon_silver.apps.balloon_game_dashboard ADD LIVE VERSION FROM LAST;

App Pages

Page	What it shows
Home	Summary cards — total pops, players, top score
Leaderboard	Ranked player table from dt_player_leaderboard
Color Analysis	Balloon color preference heatmaps from dt_balloon_color_stats
Performance Trends	Time-series scoring from dt_color_performance_trends

DuckDB Integration

DuckDB can read Snowflake-managed Iceberg tables directly via the Horizon Iceberg REST Catalog (HIRC), giving cross-engine access to the same silver data without copying files or converting formats.

Preview feature: HIRC is in Public Preview. It works in all Snowflake public regions except government regions. No additional charges apply during preview.

What Is HIRC

Snowflake exposes Snowflake-managed Iceberg tables via a standard Iceberg REST endpoint:

https://<account>.snowflakecomputing.com/polaris/api/catalog

DuckDB authenticates using a Programmatic Access Token (PAT). The PAT is exchanged for temporary credentials via the OAuth2 client credentials flow, scoped to a Snowflake role. DuckDB then reads Iceberg metadata and S3 data files directly — Snowflake does not proxy the data.

Two HIRC-specific rules to keep in mind before you start:

Warehouse name is case-sensitive — the catalog name in ATTACH must be UPPERCASE (e.g. 'BALLOON_SILVER'). Lowercase returns HTTP 404.
GRANT ON ALL TABLES skips Dynamic Iceberg Tables — you must use ON ALL DYNAMIC TABLES and ON FUTURE DYNAMIC TABLES; the plain TABLES variant silently grants nothing.

Easy Path — Interactive Notebook

Import duckdb_lab_guide.ipynb into Google Colab or any Jupyter-compatible environment (JupyterLab, VS Code, etc.) for an interactive walkthrough of external engine integration via HIRC.

Before running the notebook, complete the HIRC setup in Snowflake using 05_duckdb_hirc_setup.ipynb in Snowflake Notebooks. This creates the service account, reader role, network policy, and PAT needed for DuckDB authentication.

The DuckDB notebook:

Installs the DuckDB Iceberg and HTTPFS extensions
Authenticates to Snowflake's HIRC endpoint using the PAT created in the setup notebook
Attaches the silver database and discovers all five Dynamic Iceberg Tables
Runs cross-engine queries against the same silver data — proving multi-engine interoperability without data copies

Tip: To open in Google Colab directly, prepend https://colab.research.google.com/github/ to the notebook path:

https://colab.research.google.com/github/Snowflake-Labs/sfguide-lakehouse-iceberg-production-pipelines/blob/main/notebooks/duckdb_lab_guide.ipynb

Follow the Detailed Path below for step-by-step shell commands and raw DuckDB SQL.

Detailed Path

Quick Setup

The fastest path: set three env vars, run one task, open the notebook.

1. Add to .env (substitute your LAB_USERNAME prefix — for example ksampath):

SNOWFLAKE_ACCOUNT_URL=https://<org>-<account>.snowflakecomputing.com
SA_USER=${LAB_USERNAME}_duckdb_sa          # e.g. ksampath_duckdb_sa
SA_ROLE=${LAB_USERNAME}_duckdb_silver_reader
SNOWFLAKE_SILVER_DATABASE=${LAB_USERNAME}_balloon_silver

2. Create the service account, role, and PAT (run from the repo root):

task snowflake:pat-create

This creates the role, user, and a PAT scoped to that role, then stores the PAT in your OS keyring. No further steps needed for auth.

3. Grant Iceberg access (run as ACCOUNTADMIN — replace balloon_silver with your SNOWFLAKE_SILVER_DATABASE):

Critical: GRANT SELECT ON ALL TABLES silently skips Dynamic Iceberg Tables. Use ON ALL DYNAMIC TABLES and ON FUTURE DYNAMIC TABLES.

GRANT USAGE ON DATABASE balloon_silver TO ROLE duckdb_silver_reader;
GRANT USAGE ON SCHEMA balloon_silver.silver TO ROLE duckdb_silver_reader;
GRANT SELECT ON ALL DYNAMIC TABLES IN SCHEMA balloon_silver.silver TO ROLE duckdb_silver_reader;
GRANT SELECT ON FUTURE DYNAMIC TABLES IN SCHEMA balloon_silver.silver TO ROLE duckdb_silver_reader;

4. Open the notebook:

duckdb_lab_guide.ipynb — loads the PAT from the OS keyring automatically, installs the DuckDB Iceberg extension, attaches the silver database via HIRC, and queries all five silver DTs.

Detailed Walkthrough

This section explains each step in the Quick Setup and shows the raw DuckDB SQL if you prefer to run it outside the notebook.

Prerequisites

Silver Dynamic Iceberg Tables created and refreshed at least once
Snowflake role with CREATE ROLE / CREATE USER privileges for service account setup
DuckDB available in the project Python environment — uv sync installs it automatically

What task snowflake:pat-create does

task snowflake:pat-create calls sfutils-pat create --user SA_USER --role SA_ROLE --db SNOWFLAKE_SILVER_DATABASE, which:

Creates the role SA_ROLE if it does not exist (no hyphens — HIRC requires underscore-only role names)
Creates the user SA_USER with DEFAULT_ROLE = SA_ROLE
Grants SA_ROLE to SA_USER
Creates a PAT scoped to SA_ROLE and stores it in the OS keyring under HOST:ACCOUNT:USER:SFUTILS-PAT:PAT_NAME

The PAT never touches .env or any tracked file.

Security: For CI/CD environments, inject the PAT at runtime from a vault or secrets manager.

Grant Iceberg access (manual SQL)

The task creates the role and user but does not grant table access. Run this once as ACCOUNTADMIN:

CREATE ROLE IF NOT EXISTS duckdb_silver_reader;

GRANT USAGE ON DATABASE balloon_silver TO ROLE duckdb_silver_reader;
GRANT USAGE ON SCHEMA balloon_silver.silver TO ROLE duckdb_silver_reader;
GRANT SELECT ON ALL DYNAMIC TABLES IN SCHEMA balloon_silver.silver TO ROLE duckdb_silver_reader;
GRANT SELECT ON FUTURE DYNAMIC TABLES IN SCHEMA balloon_silver.silver TO ROLE duckdb_silver_reader;

Connect from DuckDB

Install and load the required extensions (once per DuckDB installation):

INSTALL iceberg;
LOAD iceberg;
INSTALL httpfs;
LOAD httpfs;

Create the PAT-based Iceberg secret:

CREATE SECRET iceberg_pat_secret (
  TYPE iceberg,
  CLIENT_ID '',
  CLIENT_SECRET '<your_pat>',
  OAUTH2_SERVER_URI 'https://<account>.snowflakecomputing.com/polaris/api/catalog/v1/oauth/tokens',
  OAUTH2_GRANT_TYPE 'client_credentials',
  OAUTH2_SCOPE 'session:role:duckdb_silver_reader'
);

Attach the silver database — warehouse name must be UPPERCASE:

ATTACH 'BALLOON_SILVER' AS balloon_silver (
  TYPE iceberg,
  SECRET iceberg_pat_secret,
  ENDPOINT 'https://<account>.snowflakecomputing.com/polaris/api/catalog',
  SUPPORT_NESTED_NAMESPACES false
);

Discover tables — SHOW ALL TABLES returns empty for Iceberg REST catalogs; use USE first:

USE balloon_silver.SILVER;
SHOW TABLES;

Query the player leaderboard:

SELECT player, total_score, bonus_pops, last_event_ts
FROM balloon_silver.SILVER.DT_PLAYER_LEADERBOARD
ORDER BY total_score DESC NULLS LAST
LIMIT 10;

Limitations

External engines can query but cannot write to Iceberg tables via HIRC
Reads work on Iceberg v2 or earlier only
Tables with row access policies or masking policies are not accessible via HIRC
Only Snowflake-managed Iceberg tables are supported — not externally managed, Delta, or Parquet Direct tables
SHOW ALL TABLES and information_schema are unavailable for attached Iceberg REST catalogs in DuckDB — use USE catalog.SCHEMA; SHOW TABLES

Case-Sensitive Identifiers

Snowflake identifiers are UPPERCASE when accessed through HIRC. Always use uppercase schema and table names in DuckDB:

-- Wrong: lowercase identifiers fail with "table does not exist"
SELECT * FROM balloon_silver.silver.dt_player_leaderboard;

-- Correct: uppercase matches Snowflake's internal representation
SELECT * FROM balloon_silver.SILVER.DT_PLAYER_LEADERBOARD;

Cleanup

Remove lab resources in reverse order of creation.

Snowflake Objects

Drop Dynamic Tables and the silver database:

DROP DATABASE IF EXISTS balloon_silver;

Drop the Streamlit app (if deployed). If you used the Easy Path (Cortex Code), the app name may differ — check SHOW STREAMLITS IN SCHEMA balloon_silver.apps first:

SHOW STREAMLITS IN SCHEMA balloon_silver.apps;
DROP STREAMLIT IF EXISTS balloon_silver.apps.balloon_game_dashboard;

Drop the catalog-linked database:

DROP DATABASE IF EXISTS balloon_game_events;

Drop the catalog integration:

DROP CATALOG INTEGRATION IF EXISTS glue_rest_catalog_int;

Drop the DuckDB service account objects (if created):

Revoke the DuckDB service account PAT before dropping the user:

task snowflake:pat-revoke

DROP USER IF EXISTS duckdb_sa;
DROP ROLE IF EXISTS duckdb_silver_reader;

See Privileges Lost After CLD Recreate in Troubleshooting if you recreate the CLD during teardown and re-setup.

Bronze (AWS)

Preview what will be deleted:

task bronze:cleanup-dry-run

Remove Glue tables, the Glue database, and S3 Tables control-plane resources:

task bronze:cleanup

bronze:cleanup removes Glue and S3 Tables metadata only. It does not delete BRONZE_BUCKET_NAME or objects under iceberg/ in S3. Remove those manually:

aws s3 rm "s3://$BRONZE_BUCKET_NAME/iceberg/" --recursive

Lake Formation registrations and IAM roles created for LF are not removed by bronze:cleanup — delete those in the AWS console or via CLI as needed.

Notebook users (CloudFormation): If you used the notebook's CloudFormation template, a single command removes all IAM roles, policies, Lake Formation registration, and permissions:

aws cloudformation delete-stack --stack-name snowflake-cld-iam --region $AWS_REGION
aws cloudformation wait stack-delete-complete --stack-name snowflake-cld-iam --region $AWS_REGION

If deletion fails due to active Lake Formation dependencies, check the CloudFormation Events tab for the failed resource.

Optional: Delete SIGV4 Lab Role

If task snowflake:create-glue-catalog-read-role created the IAM role, remove it after Snowflake teardown:

task bronze:cleanup-dry-run -- --delete-snowflake-catalog-iam-role

task bronze:cleanup -- --yes --delete-snowflake-catalog-iam-role

This deletes only roles tagged project=balloon-popper-demo and purpose=snowflake-glue-catalog-read.

Troubleshooting

Credential Vending Error 094120

If SYSTEM$CATALOG_LINK_STATUS returns error code 094120 ("Failed to retrieve credentials from the Catalog"), work through this checklist in order:

Two separate IAM roles: The SIGV4 role (Snowflake catalog signer) and the LF data-access role passed to register-resource --role-arn must be different principals. Using the same role causes credential vending failures.
Register-resource flags: The warehouse S3 location must be registered with HybridAccessEnabled=false and WithFederation=false. Hybrid mode produces unpredictable vending behavior.
Glue default permissions: Run aws glue update-database with empty CreateTableDefaultPermissions on GLUE_DATABASE so new tables follow Lake Formation mode, not IAM-only defaults.
LF grants: The SIGV4 role must have DESCRIBE on the database and SELECT, DESCRIBE on the table wildcard via Lake Formation grant-permissions.
Recreate the CLD: After fixing any LF or IAM setting, run CREATE OR REPLACE DATABASE … LINKED_CATALOG = ( … ). ALTER DATABASE … RESUME DISCOVERY only retries table/schema discovery — it does not re-establish the catalog connection.

Glue Schema Not Found

Glue database names surface as lowercase schema identifiers in the CLD. Always use double-quoted lowercase:

-- See the exact name Snowflake discovered:
SHOW SCHEMAS IN DATABASE balloon_game_events;

-- Use it in double quotes:
SHOW ICEBERG TABLES IN SCHEMA balloon_game_events."ksampath_balloon_pops";

Integration DISABLED After Trust Apply

IAM trust policy changes can take up to 30 seconds to propagate. Wait briefly and re-run DESC CATALOG INTEGRATION — the status should update. If it stays DISABLED, confirm GLUE_AWS_IAM_USER_ARN and GLUE_AWS_EXTERNAL_ID in the rendered trust JSON match the current DESC output exactly.

Use SYSTEM$VERIFY_CATALOG_INTEGRATION to test connectivity explicitly:

SELECT SYSTEM$VERIFY_CATALOG_INTEGRATION('glue_rest_catalog_int');

The JSON response includes error details when the trust policy or IAM permissions are misconfigured.

Empty Windowed DTs

dt_realtime_scores, dt_balloon_colored_pops, and dt_color_performance_trends use 15-second TIME_SLICE windows. They are empty when all bronze events fall in a single bucket or when DTs have not yet completed an initial refresh.

Load additional events:

task bronze:load-more

Wait for TARGET_LAG to elapse, then re-query.

Missing USAGE on External Volume

If Dynamic Iceberg Table creation fails with a permissions error:

GRANT USAGE ON EXTERNAL VOLUME <volume_name> TO ROLE <your_role>;

GRANT USAGE ON WAREHOUSE <warehouse_name> TO ROLE <your_role>;

Privileges Lost After CLD Recreate

After CREATE OR REPLACE DATABASE … LINKED_CATALOG, re-apply integration usage:

GRANT USAGE ON INTEGRATION glue_rest_catalog_int TO ROLE <your_role>;

See the BCR-2114 behavior change in Snowflake release notes regarding catalog integration usage requirements.

DuckDB HIRC: Role Not Found

HIRC does not support role names with hyphens. If you see "Role not found", ensure the role name uses underscores only — duckdb_silver_reader not duckdb-silver-reader.

DuckDB HIRC: Table Does Not Exist

If SHOW ALL TABLES returns results but a SELECT fails with "table does not exist", the identifiers are case-sensitive. Use uppercase schema and table names:

SELECT * FROM balloon_silver.SILVER.DT_PLAYER_LEADERBOARD LIMIT 5;

Task References

Quick reference for all lab tasks. Run task --list from the repo root to see all available tasks with their current status.

Root Tasks

Task	Description
setup	Bootstrap: uv sync + copy .env.example → .env if missing
check-tools	Verify lab CLIs on PATH and run aws sts get-caller-identity
default	List all available tasks
dashboard-local	Optional dev: run the Streamlit dashboard locally (not the lab outcome)
generator-local	Run the balloon game data generator locally

bronze:* Tasks

Task	Description
bronze:glue-setup	Create Glue database and register S3 warehouse
bronze:glue-setup-dry-run	Preview Glue setup without making changes
bronze:s3tables-setup	Create S3 Tables table bucket, namespace, and Iceberg table
bronze:s3tables-setup-dry-run	Preview S3 Tables setup without creating resources
bronze:render-iam	Render IAM policy template to .aws-config/
bronze:render-iam-dry-run	Print rendered IAM policy JSON without writing files
bronze:lakeformation-setup	Create LF data-access role, register S3, grant permissions to SIGV4 role
bronze:lakeformation-setup-dry-run	Preview Lake Formation setup without AWS writes
bronze:load	Load sample balloon game events into Glue Iceberg table
bronze:load-more	Append a second batch of events (different RNG seed)
bronze:snowflake-summary	Print resolved bucket, database, and ARNs for Snowflake catalog setup
bronze:snowflake-summary-json	Same as snowflake-summary with JSON output
bronze:cleanup	Delete bronze metadata: Glue tables/database and S3 Tables resources
bronze:cleanup-dry-run	Preview bronze cleanup without deleting anything
bronze:all	Run glue-setup, s3tables-setup, and load in sequence

snowflake:* Tasks

Task	Description
snowflake:create-glue-catalog-read-role	Create SIGV4 IAM role with Glue/LF permissions; write ARN to .aws-config/
snowflake:create-glue-catalog-read-role-dry-run	Print IAM trust and permissions JSON without creating AWS resources
snowflake:apply-glue-catalog-trust-from-rendered	Apply rendered trust policy to the SIGV4 IAM role
snowflake:describe-catalog-integration	Print catalog integration properties from DESC
snowflake:describe-catalog-integration-json	Same as describe-catalog-integration with JSON output
snowflake:render-glue-catalog-trust	Write rendered trust policy JSON from DESC output
snowflake:render-glue-catalog-trust-dry-run	Print rendered trust JSON without writing to .aws-config/
snowflake:generate-lab-sql	Write 01_catalog_integration and 02_cld_verify generated SQL files
snowflake:generate-lab-sql-stdout	Print catalog and CLD SQL to stdout only
snowflake:generate-lab-sql-all	Write all three generated SQL files in one shot
snowflake:print-env-hints	Print Snowflake CLD env defaults and SIGV4 hints
snowflake:sis-deploy	Deploy the Streamlit in Snowflake app to balloon_silver.apps
snowflake:url	Print deployed Streamlit in Snowflake app URL
snowflake:pat-create	Create PAT for duckdb_sa and store in OS keychain
snowflake:pat-print	Print PAT value from keychain to stdout
snowflake:pat-revoke	Revoke and delete PAT from keychain and Snowflake

dt:* Tasks

Task	Description
dt:generate-sql	Write 03_dt_pipelines.generated.sql with all five silver DTs
dt:generate-sql-stdout	Print DT SQL to stdout only
dt:extvol-help	Show sfutils-extvolumes top-level CLI help
dt:extvol-create-help	Show sfutils-extvolumes create subcommand help
dt:extvol-create-dry-run	Preview S3/IAM/Snowflake external volume creation without changes
dt:extvol-create	Create S3 bucket, IAM role/policy, and Snowflake external volume
dt:extvol-verify	Verify connectivity for an existing Snowflake external volume
dt:extvol-describe	Describe an existing Snowflake external volume
dt:extvol-update-trust	Re-sync IAM trust policy from Snowflake to the IAM role
dt:extvol-delete	Drop Snowflake external volume and IAM resources

Conclusion And Resources

Congratulations! You have successfully built an end-to-end Iceberg lakehouse pipeline with AWS and Snowflake.

Starting from a raw event stream in AWS Glue, you connected Snowflake directly to externally managed Iceberg tables without an ETL copy, layered in Dynamic Iceberg Tables that write silver aggregates back to open Iceberg format, shipped a live Streamlit in Snowflake dashboard, and queried the same silver tables from DuckDB via the Horizon Iceberg REST Catalog — all while keeping every layer in open format and every file in storage you control.

What You Learned

How to prepare a bronze Iceberg landing zone in AWS using Glue, S3, and Lake Formation with vended credentials configured for Snowflake
How to configure a Snowflake Glue Iceberg REST catalog integration with a two-role Lake Formation setup
How to create a catalog-linked database that reflects externally managed Iceberg tables without data duplication
How to build Dynamic Iceberg Tables that transform bronze JSON into production-ready silver aggregates on a declared target lag
How to make your silver data AI-ready by creating a Semantic View that enables natural-language querying via Snowflake Intelligence and Cortex Analyst
How to deploy a Streamlit in Snowflake dashboard that reads from silver Dynamic Tables
How to query Snowflake-managed Iceberg tables from DuckDB via the Horizon REST Catalog using a Programmatic Access Token
How intent-driven development with Cortex Code can replace manual coding workflows — and how refining prompts iteratively captures reusable engineering specifications

Related Resources

Documentation:

AWS Documentation:

Additional Reading:

Updated 2026-05-06

This content is provided as is, and is not maintained on an ongoing basis. It may be out of date with current Snowflake instances