Build Autonomous SQL Pipelines for AI Agents

Snowflake for Developers/Guides/Build Autonomous SQL Pipelines for AI Agents

Quickstart

Build Autonomous SQL Pipelines for AI Agents

Transformation

Yoav Ostrinsky

Overview

What if you could build a production data pipeline by simply describing what you want? In this quickstart, you'll do exactly that — using Snowflake CoCo as your AI co-pilot to build an end-to-end Dynamic Tables pipeline from natural language prompts.

Dynamic Tables let you declare what your pipeline should produce using SQL. Snowflake CoCo lets you describe what you want in plain English and generates the SQL for you. Together, they represent a fully declarative approach to data engineering — from intent to production pipeline, no boilerplate required.

You'll work with the Tasty Bytes dataset (a fictitious food truck company) to build a three-tier pipeline that enriches raw order data, joins it into fact tables, and pre-aggregates business metrics — all by giving Snowflake CoCo prompts. Once the pipeline is running, you'll wire it up to an AI agent that answers natural language questions about your data.

Prerequisites

Basic familiarity with data engineering concepts (data loading, transformations)
No SQL writing required — Snowflake CoCo writes it for you

What You'll Learn

How to use Snowflake CoCo as a data engineering co-pilot in Snowsight
How to build Dynamic Tables by describing your pipeline in natural language
How to configure refresh strategies (TARGET_LAG, DOWNSTREAM) through prompts
How Dynamic Tables handle incremental refresh automatically
How to monitor pipeline operations conversationally
How to create semantic views and AI agents using Snowflake CoCo

What You'll Need

A Snowflake account (trial, or existing accounts) - preferably in AWS US West 2 (Oregon) region, Enterprise edition
ACCOUNTADMIN access (available in trial accounts)
Snowflake CoCo enabled in your account (setup guide)

What You'll Build

A three-tier declarative data pipeline processing ~1 billion order records
A stored procedure for generating synthetic test data
Monitoring queries for pipeline observability
A semantic view for natural language querying
A Cortex Agent for conversational data exploration

Open Snowflake CoCo

All of the work in this quickstart is driven through Snowflake CoCo — Snowflake's AI co-pilot built into Snowsight.

Open the Snowflake CoCo panel by clicking the Snowflake CoCo icon in the lower-right corner of Snowsight.

You'll see a chat interface. Throughout this quickstart, you'll paste prompts from each section directly into CoCo. CoCo will generate the SQL, explain what it's doing, and execute it for you.

Set Up the Environment

Let's create the database, warehouse, and load the Tasty Bytes dataset. Give CoCo the following prompt:

Using ACCOUNTADMIN, create a role called lab_role and grant it CREATE WAREHOUSE
and CREATE DATABASE privileges on the account. Then switch to lab_role and:
- Create a database called tasty_bytes_db with two schemas: raw and analytics
- Create a 2XL standard warehouse called tasty_bytes_wh with 60s auto-suspend
  and auto-resume enabled
- Create these tables in tasty_bytes_db.raw:
  - order_header: columns for order_id, truck_id, location_id, customer_id,
    discount_id, shift_id, shift_start_time, shift_end_time, order_channel,
    order_ts (TIMESTAMP_NTZ), served_ts, order_currency, order_amount (NUMBER),
    order_tax_amount, order_discount_amount, order_total (NUMBER)
  - order_detail: columns for order_detail_id, order_id, menu_item_id,
    discount_id, line_number, quantity, unit_price, price, order_item_discount_amount
  - menu: columns for menu_id, menu_type_id, menu_type, truck_brand_name,
    menu_item_id, menu_item_name, item_category, item_subcategory,
    cost_of_goods_usd, sale_price_usd, menu_item_health_metrics_obj (VARIANT)
- Create a CSV file format in tasty_bytes_db.public
- Create an external stage tasty_bytes_db.raw.tasty_bytes_stage pointing to
  s3://sfquickstarts/tasty-bytes-builder-education/ using the CSV file format
- Load all three tables using COPY INTO from the stage subdirectories:
  raw_pos/order_header/, raw_pos/order_detail/, raw_pos/menu/

CoCo will create all infrastructure and load ~1 billion rows from S3. The data load takes a few minutes due to the volume.

Note: When CoCo asks for permission to execute SQL, you'll see an approval prompt with three options:

Allow Once — approves only this specific statement

Allow CREATE — approves all CREATE statements for the current session

Always Allow CREATE — permanently allows CREATE statements without prompting in future sessions

Note: If you encounter permission errors, ensure you're running as ACCOUNTADMIN when creating the role and granting privileges.

Understanding Dynamic Tables

Before we build the pipeline, let's understand the key concepts.

Dynamic Tables are a declarative way of defining data transformations in Snowflake. Instead of writing stored procedures, streams, and tasks to orchestrate your pipeline, you define the desired result using a SELECT statement. Snowflake automatically:

Identifies and processes only changed rows (incremental refresh)
Refreshes tables based on a target lag setting
Manages dependencies between Dynamic Tables

Key Concepts:

TARGET_LAG: The maximum acceptable delay between source changes and the Dynamic Table reflecting them. Example: TARGET_LAG = '1 hour' means data is at most 1 hour stale.
DOWNSTREAM: A special TARGET_LAG value meaning "refresh only when downstream consumers need it." Creates an efficient pull-based refresh model.
Incremental Refresh: Dynamic Tables process only changed data rather than recomputing everything. Processing 1,000 new orders out of 1 billion is nearly instant.
Dependency Graph: Snowflake tracks dependencies between Dynamic Tables and determines refresh order automatically.
INITIALIZE = 'ON_SCHEDULE': By default (ON_CREATE), a Dynamic Table performs an immediate full refresh the moment it's created — processing all source data before returning. With ON_SCHEDULE, the table is created instantly (empty) and populated on the first scheduled or manual refresh. This is the recommended approach for production and lab pipelines because it separates definition from execution: you can create the entire pipeline structure quickly, then trigger the initial load when you're ready. It also avoids long-running CREATE statements and makes deployments faster and more predictable.

Now let's tell Snowflake CoCo to build a three-tier pipeline using these concepts.

Build the Pipeline

This is the core of the quickstart — you'll describe an entire three-tier data pipeline to Snowflake CoCo, and it will generate all the Dynamic Table DDL.

Before submitting the prompt, switch CoCo to Plan mode by pressing Shift+Tab. In Plan mode, CoCo will show you exactly what it intends to do before executing anything — useful for reviewing a large DDL change like this before it runs. Once you've reviewed the plan and are happy with it, switch back to normal mode and execute.

Give CoCo the following prompt:

Build a 3-tier dynamic table pipeline in tasty_bytes_db.analytics using
warehouse tasty_bytes_wh:

Use INITIALIZE = 'ON_SCHEDULE' for all dynamic tables.

Tier 1 - Enrichment (TARGET_LAG = DOWNSTREAM):
- orders_enriched: From tasty_bytes_db.raw.order_header. Include order_id,
  truck_id, customer_id, order_channel. Add temporal dimensions: order_date,
  day_name (DAYNAME), order_hour (HOUR), and order_ts as order_timestamp.
  Include order_amount and order_total. Cast order_discount_amount to NUMBER(10,2)
  using TRY_TO_NUMBER. Add a has_discount boolean (true when discount_id is not
  null and not empty string). Filter out null order_id and null order_ts.

- order_items_enriched: Join tasty_bytes_db.raw.order_detail with
  tasty_bytes_db.raw.menu on menu_item_id. Include order_detail_id, order_id,
  line_number, menu_item_id, menu_item_name, item_category, item_subcategory,
  truck_brand_name, menu_type, quantity, unit_price, price as line_total,
  cost_of_goods_usd, sale_price_usd. Calculate unit_profit (unit_price minus
  cost_of_goods_usd), line_profit (unit_profit times quantity), and
  profit_margin_pct (rounded to 2 decimals, handle zero unit_price). Cast
  order_item_discount_amount to NUMBER(10,2) as line_discount_amount. Add
  has_discount flag. Filter out null order_id and null menu_item_id.

Tier 2 - Fact Table (TARGET_LAG = DOWNSTREAM):
- order_fact: Inner join orders_enriched (alias o) with order_items_enriched
  (alias oi) on order_id. Include all columns from both. Rename
  o.order_discount_amount as order_level_discount and o.has_discount as
  order_has_discount, oi.has_discount as line_has_discount.

Tier 3 - Aggregated Metrics (TARGET_LAG = 1 hour):
- daily_business_metrics: Aggregate order_fact by order_date and day_name.
  Include count distinct order_id (total_orders), truck_id (active_trucks),
  customer_id (unique_customers), sum quantity (total_items_sold), sum
  order_total (total_revenue), avg order_total (avg_order_value), sum
  line_total (total_line_item_revenue), sum line_profit (total_profit),
  avg profit_margin_pct (avg_profit_margin_pct), count orders with discount,
  sum discount amounts.

- product_performance_metrics: Aggregate order_fact by menu_item_id,
  menu_item_name, item_category, item_subcategory, truck_brand_name, menu_type.
  Include count distinct order_id (order_count), sum quantity (total_units_sold),
  sum line_total (total_revenue), sum line_profit (total_profit), avg unit_price,
  avg profit_margin_pct, avg cost_of_goods_usd (avg_cogs), avg sale_price_usd
  (standard_sale_price), revenue_per_unit, profit_per_unit.

Review before executing: Verify that TARGET_LAG values match (DOWNSTREAM for Tier 1-2, 1 hour for Tier 3), INITIALIZE = 'ON_SCHEDULE' is set on all tables, join conditions are correct, and the dependency flows raw → Tier 1 → Tier 2 → Tier 3.

Why ON_SCHEDULE? With INITIALIZE = 'ON_SCHEDULE', all 5 CREATE statements return immediately — no waiting for an initial full scan of 1 billion rows. The pipeline is defined but empty. You'll trigger the first load manually in the next step, giving you full control over when the expensive initial refresh happens.

Now trigger the initial load by refreshing only the top-tier tables. Snowflake will automatically cascade through the dependency graph — refreshing Tier 1 and Tier 2 first, then Tier 3:

Refresh daily_business_metrics and product_performance_metrics in
tasty_bytes_db.analytics. Snowflake will handle the upstream dependencies.

What to observe: You only triggered Tier 3 — but Snowflake automatically refreshed orders_enriched and order_items_enriched (Tier 1), then order_fact (Tier 2), before completing the Tier 3 aggregations. This is dependency graph management in action.

View Dependency Graph

Snowflake automatically tracks dependencies between your Dynamic Tables. Let's visualize this.

Navigate to Catalog > Database Explorer in Snowsight and open tasty_bytes_db
Click on the analytics schema
Click on any Dynamic Table (e.g., order_fact)
Look for the Graph tab

You'll see:

Upstream: orders_enriched, order_items_enriched
Downstream: daily_business_metrics, product_performance_metrics

Optionally, ask CoCo:

Show me all dynamic tables in tasty_bytes_db.analytics with their target lag settings

Generate Test Data

To demonstrate incremental refresh, we need a way to insert new orders. Ask CoCo to create a stored procedure:

Create a stored procedure tasty_bytes_db.raw.generate_demo_orders(num_rows INTEGER)
using SQL language that generates synthetic orders. It should:
- Capture row counts in order_header and order_detail before insertion
- Sample num_rows random existing orders from order_header into a temporary table
- Assign new unique order IDs to avoid conflicts with existing data
- Set timestamps to the current date while preserving the original time-of-day
- Apply random price variation of plus or minus 20% using UNIFORM
- Insert the new orders into order_header
- For each new order, copy the corresponding order_detail records from the
  original order, generating new order_detail_ids and applying the same
  price randomization
- Capture row counts after insertion
- Clean up the temporary table
- Return a summary string showing how many orders and line items were inserted

Review before executing: This is a complex stored procedure — verify it creates a temp table with new_order_id mappings and joins back to order_detail to preserve referential integrity.

Test Incremental Refresh

Now let's see incremental refresh in action. Give CoCo these prompts one at a time:

Prompt 1 — Establish baseline:

How many rows are currently in tasty_bytes_db.raw.order_header and
tasty_bytes_db.raw.order_detail?

Prompt 2 — Generate new data:

Call tasty_bytes_db.raw.generate_demo_orders(1000)

Prompt 3 — Trigger refresh (top tier only):

Refresh daily_business_metrics and product_performance_metrics in
tasty_bytes_db.analytics.

What to observe: You only issued refresh commands for the Tier 3 tables — but Snowflake automatically cascaded through Tier 2 and Tier 1 first. This is the DOWNSTREAM dependency model at work.

Prompt 4 — Verify incremental behavior:

Query tasty_bytes_db.information_schema.dynamic_table_refresh_history() for each
of the 5 dynamic tables in tasty_bytes_db.analytics. Show the most recent
refresh_action, state, refresh_trigger, and duration in seconds for each.

Key insight: The refresh_action column should show INCREMENTAL — Snowflake processed only the 1,000 new orders through the entire pipeline, not the full billion-row dataset. This is what makes Dynamic Tables efficient at scale.

Discussion: Notice that product_performance_metrics shows a FULL refresh. Why? This table aggregates across all products — when a new order arrives for an existing product (e.g., "The King Combo"), the aggregate row for that product must be updated. Snowflake rewrites the affected group rather than appending a new row. Does a full rewrite make sense here? Consider: with 1,000 new rows out of 1 billion, the rewrite only touches the affected product groups — it's still far more efficient than processing all source data.

Prompt 5 — View updated results:

Query tasty_bytes_db.analytics.daily_business_metrics ordered by order_date
descending, limit 5. Then query tasty_bytes_db.analytics.product_performance_metrics
ordered by total_revenue descending, limit 10.

Monitor Pipeline

Ask CoCo for a full monitoring summary:

Show a monitoring summary for all dynamic tables in tasty_bytes_db.analytics.
For each table, show its name, target_lag, scheduling state, and — from
tasty_bytes_db.information_schema.dynamic_table_refresh_history() — the most
recent refresh_action, state, and duration in seconds. Use a window function
to get only the latest refresh per table.

Key columns to watch:

refresh_action: INCREMENTAL (efficient) vs FULL (first run or when incremental isn't possible)
state: SUCCEEDED or FAILED
duration: How long the refresh took

Create Semantic View and Agent

Let's make our pipeline data queryable in natural language by creating a semantic view and AI agent:

Create a semantic view called tasty_bytes_semantic_model in TASTY_BYTES_DB.ANALYTICS
over all 5 dynamic tables in the analytics schema: orders_enriched,
order_items_enriched, order_fact, daily_business_metrics, and
product_performance_metrics. Then create a Cortex Agent called tasty_bytes_agent
in TASTY_BYTES_DB.ANALYTICS that uses this semantic view as its Cortex Analyst
tool. Set the display name to "Tasty Bytes Analytics Agent".

Test Your Agent

Once created, navigate to AI & ML → Snowflake CoWork in Snowsight, select your agent, and try:

"What are the top 10 products by revenue?"
"Show me daily revenue trends for the last 30 days"
"Which truck brands are most profitable?"
"What's the average order value by day of week?"

The agent translates natural language into SQL queries against your Dynamic Tables and returns results with visualizations.

Clean up

Ask CoCo to tear down everything created during this quickstart:

Drop database tasty_bytes_db and warehouse tasty_bytes_wh using lab_role.
Then switch to ACCOUNTADMIN and drop lab_role.

Conclusion and Resources

Congratulations! You've built a complete autonomous data pipeline without writing a single line of SQL manually. By combining Snowflake CoCo with Dynamic Tables, you:

Described a multi-tier pipeline in plain English and had it materialized automatically
Generated and executed complex stored procedures from a single prompt
Verified incremental refresh behavior across a billion-row dataset
Created a natural language AI agent backed by your pipeline data

What You Learned

Using Snowflake CoCo to generate and execute SQL from natural language
Building multi-tier Dynamic Table pipelines declaratively
Configuring TARGET_LAG and DOWNSTREAM refresh strategies
Verifying incremental refresh behavior
Monitoring Dynamic Table operations
Creating semantic views and AI agents with Snowflake CoCo

Related Resources

Updated 2026-05-27

This content is provided as is, and is not maintained on an ongoing basis. It may be out of date with current Snowflake instances