Build an End-to-End Application Using CoCo on Snowflake
Overview
In this hands-on lab, you'll build a complete AI-powered retail analytics platform entirely within Snowflake — no external infrastructure required. Using Snowflake CoCo as your AI-assisted development environment, you'll work through the full data lifecycle: stream real-time orders via Snowpipe Streaming, MERGE them into production tables with Gen2 Warehouses, transform them through a 3-tier Dynamic Tables pipeline, and serve them with Interactive Tables for low-latency point lookups.
You'll build analytical models with dbt, monitor data quality with Data Metric Functions, and create custom CoCo skills for reusable workflows. Tie it all together with Snowflake CoWork — a conversational AI interface where a Cortex Agent orchestrates Cortex Analyst and Agentic Search to answer "what happened" and "why" from both structured and unstructured data. Finally, evaluate your agent with ground-truth datasets, implement row-level security, and expose your agent as a managed MCP server for external AI clients.
What You'll Learn
- Accelerate development with Snowflake CoCo (AI-assisted SQL, deployment, and data exploration)
- Stream real-time data with Snowpipe Streaming and transform with Dynamic Tables
- Serve low-latency queries with Interactive Tables and Gen2 Warehouses
- Build analytical models with dbt
- Monitor data quality automatically with Data Metric Functions
- Create and query managed Iceberg V3 tables (deletion vectors, row lineage) (optional)
- Create custom CoCo skills for reusable team workflows
- Build a Cortex Agent with Cortex Analyst (semantic view + verified queries) and Agentic Search (multi-index Cortex Search)
- Evaluate agent quality with ground-truth datasets and LLM judges
- Expose agents as managed MCP servers for external AI clients
- Implement transparent row-level security with Row Access Policies
What You'll Build
A production-grade AI-powered retail analytics platform on Snowflake — from raw data to conversational AI insights, entirely within a single platform. You'll create dynamic transformation pipelines, interactive low-latency tables, dbt analytical models, a Cortex Agent that answers questions across structured and unstructured data, row-level security that works transparently through AI, and an MCP server that exposes your agent to external clients.
Prerequisites
- Access to a Snowflake account
- Python 3.8+ installed locally
- Git installed locally
- Basic familiarity with SQL and command-line tools
Setup
Install Snowflake CLI
The Snowflake CLI (snow) lets you run SQL, deploy apps, and manage Snowflake objects from your terminal.
macOS (using Homebrew):
If you don't have Homebrew installed, first install it by opening Terminal and running:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Then install Snowflake CLI:
brew install snowflake-cli
Windows:
pip install snowflake-cli
Linux:
pip install snowflake-cli
Verify the installation:
snow --version
You should see output like Snowflake CLI version: 3.x.x.
Install Snowflake CoCo
Snowflake CoCo is an AI-powered coding assistant that runs in your terminal. It helps you write SQL, build pipelines, deploy apps, and explore your data using natural language prompts.
macOS (using Homebrew):
brew install cortex
Windows / Linux:
pip install cortex
Verify the installation:
cortex --version
Configure Snowflake Connection
Before you can run any commands against Snowflake, you need to configure a connection. This tells the CLI which Snowflake account to connect to and how to authenticate.
Run the interactive connection wizard:
snow connection add
You'll be prompted for the following values (enter them one at a time):
| Prompt | What to enter | Example |
|---|---|---|
| Connection name | A short name for this connection | hol |
| Account identifier | Your Snowflake account URL (without .snowflakecomputing.com) | myorg-myaccount |
| User | Your Snowflake username | jsmith |
| Password | Your Snowflake password | (hidden) |
| Role | ACCOUNTADMIN | ACCOUNTADMIN |
| Warehouse | Leave blank (will be created by setup) | (leave empty) |
| Database | Leave blank (will be created by setup) | (leave empty) |
Tip: Your account identifier is the part before
.snowflakecomputing.comin your Snowflake URL. For example, if you log in athttps://myorg-myaccount.snowflakecomputing.com, your account identifier ismyorg-myaccount.
Test that your connection works:
snow connection test -c hol
You should see Status: OK.
Clone the Lab Repository
git clone https://github.com/Snowflake-Labs/automated-intelligence-dev-day-2026-hol.git cd automated-intelligence-dev-day-2026-hol
Run Infrastructure Setup
Launch Snowflake CoCo and verify your connection:
cortex
What to expect: CoCo will start an interactive session in your terminal. You'll see your active connection, role, and warehouse displayed. You can type natural language prompts and CoCo will translate them into SQL or actions.
Then run the core infrastructure script (this takes ~10-15 minutes):
snow sql -f setup.sql -c hol
This creates the database, schemas, warehouses, tables, Dynamic Tables pipeline, Interactive Tables, Cortex Search Services, Semantic View, seed data (10M orders, 25M order items, 2M customers), and Row Access Policy.
Gen2 Warehouse: Optima Indexing
Gen2 Warehouses introduce Optima Indexing — an automatic indexing layer that prunes partitions at query time without explicit clustering keys. Gen2 warehouses learn from your query patterns and build internal indices that accelerate point lookups and filtered scans.
Demonstrate Optima Indexing in action:
Prompt CoCo:
"Run a point lookup for customer_id 5000 on the Gen2 warehouse"
Open the query profile in Snowsight to see partition pruning — only a fraction of partitions scanned despite no explicit clustering key. This is Gen2's Optima Indexing in action.
Dynamic Tables Pipeline
Dynamic Tables are declarative data pipelines — you define the target state as a SQL query and Snowflake handles incremental refresh automatically. The TARGET_LAG parameter controls freshness: set a time-based lag (e.g., 1 minute) for Tier 1 tables, or use DOWNSTREAM for tables that refresh only when their upstream sources change.
Prompt CoCo:
"Show me the Dynamic Tables pipeline status — names, target lag, last refresh time, and row counts for each tier"
CoCo displays the 3-tier pipeline:
- Tier 1 (1-min lag):
enriched_orders(10M rows),enriched_order_items(25M rows) - Tier 2 (DOWNSTREAM):
fact_orders(25M rows) - Tier 3 (DOWNSTREAM):
daily_business_metrics(118 rows),product_performance_metrics(4 rows)
Note: Row counts shown are for the default
data_scale = '10M'. If you chose'50M'insetup.sql, expect ~50M enriched_orders, ~161M order_items, and 365 daily metrics rows.
Explore Results
Ask CoCo:
"Show me a sample of the daily business metrics — top 5 days by revenue"
Expected: Top-5 days are in September 2025 (back-to-season peak), each with ~$183M revenue and ~117K orders.
Interactive Tables
Interactive Tables are purpose-built for low-latency, high-concurrency point lookups. They maintain pre-computed results with clustering optimized for equality predicates, delivering sub-second response times for dashboard filters and application queries that would otherwise require full table scans.
Point Lookups
Run queries in Snowsight to observe sub-second latency:
USE WAREHOUSE hol_interactive_wh; ALTER SESSION SET USE_CACHED_RESULT = FALSE; -- Point lookup by customer ID SELECT * FROM dash_automated_intelligence_db.interactive.customer_order_analytics WHERE customer_id = 1; -- Point lookup by order ID -- Run: SELECT order_id FROM dash_automated_intelligence_db.interactive.order_lookup LIMIT 5; to get UUIDs to use SELECT * FROM dash_automated_intelligence_db.interactive.order_lookup WHERE order_id = '<any-order-uuid>';
Concurrency Load Test
Prompt CoCo:
"Run the interactive tables load test at interactive/load_test.py"
This fires 200 concurrent sessions (1000 queries total) against both Interactive and Standard warehouses, then compares P50/P90/P99 latencies. You should see notably lower latency and higher throughput on the Interactive warehouse.
Run the load test a second time to observe the effect of warm caches. Results may vary depending on account, region, and data scale.
Data Quality
Data Metric Functions (DMFs) let you attach automated quality checks directly to table columns. Snowflake runs them on a schedule and stores results in SNOWFLAKE.LOCAL.DATA_QUALITY_MONITORING_RESULTS. Built-in DMFs include NULL_COUNT, DUPLICATE_COUNT, UNIQUE_COUNT, and FRESHNESS — or you can write custom ones.
The setup script injected ~200 NULL values into orders.total_amount and order_items.quantity, plus ~150 NULLs into order_items.product_name. DMFs detect the first two — but there's a gap.
Discover the Gap
Prompt CoCo:
"Check the data quality monitoring results and show me which columns have NULL violations"
CoCo shows that TOTAL_AMOUNT (200 NULLs) and QUANTITY (200 NULLs) have violations — but product_name NULLs are going undetected.
"Are there any NULL values in order_items.product_name? Is that column being monitored?"
CoCo finds ~150 NULLs and reveals the DMF is mis-attached to product_category instead of product_name.
Fix the Coverage
"Fix the DMF — remove the NULL check from product_category and add it to product_name instead"
ALTER TABLE order_items DROP DATA METRIC FUNCTION SNOWFLAKE.CORE.NULL_COUNT ON (product_category); ALTER TABLE order_items ADD DATA METRIC FUNCTION SNOWFLAKE.CORE.NULL_COUNT ON (product_name);
This demonstrates the real-world workflow: monitor, discover gaps, fix coverage.
dbt Analytics
dbt-snowflake transforms raw data into analytics-ready models using SQL SELECT statements. The dbt-snowflake adapter integrates natively with Snowflake, supporting incremental models, snapshots, and built-in testing. In this HOL, dbt creates staging views and mart tables for customer lifetime value, segmentation, product performance, and support analytics.
Prerequisites:
dbt-coreanddbt-snowflakemust be installed (pip install dbt-snowflake). CoCo will handle this for you if not already installed.
Prompt CoCo:
"Install dbt dependencies and build all models in the dbt-analytics project"
CoCo runs dbt deps then dbt build to create all staging views and mart tables (9+ models). CoCo automatically injects your active Snowflake connection into the dbt profile — no manual configuration needed.
Expected output: 71 tests pass, 1 warning (the
source_not_null_raw_orders_total_amounttest detects the 200 NULLs we injected for the Data Quality exercise — this is working as designed).
Explore Results
"Show me the customer lifetime value segments — how many customers are in each value tier?"
CoCo Custom Skill
CoCo Custom Skills let you package repeatable workflows into named commands that any team member can invoke. A skill is a Markdown file (.cortex/skills/<name>/SKILL.md) that defines triggers, parameters, and step-by-step instructions CoCo follows when the skill is activated.
Create a reusable skill that automates table profiling:
Prompt CoCo:
"Create a custom CoCo skill called 'profile-table' that takes a table name, counts rows, checks for NULL columns, shows distinct value counts, and flags potential data quality issues"
CoCo creates .cortex/skills/profile-table/SKILL.md with the skill definition, triggers, and step-by-step instructions.
Note: After creating the skill, restart CoCo (type
/quitthencortex) for the new skill to become active.
Test It
"$profile-table DASH_AUTOMATED_INTELLIGENCE_DB.RAW.ORDERS"
This demonstrates how teams package repeatable workflows as shareable CoCo skills.
Snowflake CoWork
Cortex Agents are multi-tool AI orchestrators that route questions to the right data source. They combine Cortex Analyst (text-to-SQL over Semantic Views) with Cortex Search (vector + keyword search over unstructured data) to answer both "what happened" and "why" from a single conversational interface.
Create the Agent
Prompt CoCo:
"Run snowflake-cowork/create_agent.sql to create the Business Insights Agent"
Test Agent Routing
Open the Snowflake CoWork interface in Snowsight: navigate to AI & ML → CoWork (or search "CoWork" in the global search bar). Select the BUSINESS_INSIGHTS_AGENT agent. Then try each question to demonstrate different tool routing:
| Question | Tools Used |
|---|---|
| "Show me monthly revenue trend from June to September 2025" | Cortex Analyst (text-to-SQL) |
| "Which month had the lowest revenue, and what do customer reviews say about that period?" | Cortex Analyst + Agentic Search |
| "Find reviews mentioning wrong size with a rating below 3" | Agentic Search (filtered) |
| "Why are customers returning ski boots?" | Agentic Search (reviews + tickets) |
| "What is our total revenue and customer count by state?" | Cortex Analyst (text-to-SQL) |
| "What are the top complaint themes in support tickets?" | Agentic Search (filter + AI_AGG) |
| "How many reviews mention sizing issues, and which products are most affected?" | Agentic Search (search + breakdown) |
This is the capstone moment — the agent routes across structured data (text-to-SQL) and unstructured data (Cortex Search) to answer "what happened" and "why."
Security and Governance
Row Access Policies enforce row-level security declaratively — you define a boolean expression that determines which rows are visible to which roles, and Snowflake applies it transparently to every query (including those generated by AI agents). No application code changes required.
The Row Access Policy and WEST_COAST_MANAGER role were created by setup.sql. Demonstrate how row-level security transparently filters data based on role:
As ACCOUNTADMIN (full access):
USE ROLE ACCOUNTADMIN; USE WAREHOUSE HOL_WH; SELECT c.state, SUM(o.total_amount) AS total_revenue, COUNT(DISTINCT c.customer_id) AS customer_count FROM dash_automated_intelligence_db.raw.orders o JOIN dash_automated_intelligence_db.raw.customers c ON o.customer_id = c.customer_id GROUP BY c.state ORDER BY total_revenue DESC;
Result: all 10 states visible.
As WEST_COAST_MANAGER (restricted):
USE ROLE WEST_COAST_MANAGER; USE WAREHOUSE HOL_WH; SELECT c.state, SUM(o.total_amount) AS total_revenue, COUNT(DISTINCT c.customer_id) AS customer_count FROM dash_automated_intelligence_db.raw.orders o JOIN dash_automated_intelligence_db.raw.customers c ON o.customer_id = c.customer_id GROUP BY c.state ORDER BY total_revenue DESC;
Result: only CA, OR, WA appear — the Row Access Policy transparently filters data.
Key insight: Same query, same tables — different results based on who's asking. Row-level security enforces data boundaries without changing application logic.
Verify Through CoWork
Switch to the WEST_COAST_MANAGER role in Snowsight, then open CoWork and ask the agent:
"What is our total revenue and customer count by state?"
The agent returns results for only CA, OR, and WA — the Row Access Policy filters data transparently, even through AI-generated SQL.
Streamlit Dashboard
Streamlit in Snowflake lets you build and deploy interactive data applications directly within your Snowflake account — no external infrastructure needed. Apps run securely inside Snowflake, with native access to your data and governed by the same role-based access control.
Prerequisites: The dbt models must be built first (previous section). The dashboard queries
DBT_ANALYTICSandDBT_STAGINGtables.
Prompt CoCo:
"Deploy the Streamlit dashboard to Snowflake"
CoCo runs snow streamlit deploy from the streamlit-dashboard/ directory. Once deployed, open the app URL in Snowsight to explore:
- Summary — Revenue KPIs, order trends, customer counts
- Customer & Product Analytics — Lifetime value segments, product performance
- Pipeline Health — Dynamic Tables refresh status, data freshness monitoring
Agent Evaluation
Agent Evaluation lets you measure agent quality using ground-truth datasets and LLM-as-judge scoring. You define input queries and expected answers, then Snowflake runs the agent against each question and scores responses on Answer Correctness (does the response match ground truth?) and Logical Consistency (are the reasoning steps internally coherent?).
The evaluation dataset (7 questions + ground truth) was created by setup.sql. Run the evaluation in Snowsight:
Run via Snowsight UI
- Switch to the ACCOUNTADMIN role in Snowsight (top-left role selector)
- Navigate to AI and ML > Agents > BUSINESS_INSIGHTS_AGENT > Evaluations tab
- Click New evaluation run, name it (e.g.
hol-eval-run-1), click Next - Select Create new dataset from table
- Under Source table, set Database and schema to
DASH_AUTOMATED_INTELLIGENCE_DB.SEMANTIC, then selectAGENT_EVALUATION_DATA - Under New dataset location, keep
DASH_AUTOMATED_INTELLIGENCE_DB.SEMANTIC - Set Dataset name:
hol_eval_dataset - Click Next
- Under Define metrics, confirm Input query =
INPUT_QUERY - Toggle on Answer Correctness, set Expected answer =
GROUND_TRUTH - Toggle on Logical Consistency
- Click Create — evaluation starts automatically (~3 min)
Interpret Results
- Answer Correctness — Did the agent's response match ground truth? Scored 0-1 per question.
- Logical Consistency — Were planning steps, tool calls, and response internally consistent? (Reference-free.)
- Per-question drill-down — Select any row to see the full thread: planning, tool invocations, response generation.
Improve Scores (Stretch)
If questions score low on logical consistency:
- Click a low-scoring row and view Thread details
- Look for vague reasoning about tool selection in the Planning step
- Update the agent's instructions to be more explicit
- Recreate the agent and re-run the evaluation
MCP Server
Snowflake MCP Servers expose your Cortex Agents, Semantic Views, and Search Services as tools discoverable via the open Model Context Protocol (MCP). Any MCP-compatible client (CoCo CLI, Claude Desktop, custom apps) can connect to the server endpoint and invoke tools programmatically — turning your Snowflake AI stack into a reusable service layer.
Expose the Business Insights Agent as a managed MCP server:
Prompt CoCo:
"Create a Snowflake-managed MCP server that exposes our Business Insights Agent, semantic view, and customer feedback search as tools"
CoCo creates the MCP server:
Note: CoCo generates tool names (like
revenue-analytics,customer-feedback-search) based on your prompt. Your names may differ — what matters is that thetypeandidentifierpoint to the correct objects.
CREATE MCP SERVER business_insights_mcp FROM SPECIFICATION $$ tools: - name: "business-insights-agent" type: "CORTEX_AGENT_RUN" identifier: "DASH_AUTOMATED_INTELLIGENCE_DB.SEMANTIC.BUSINESS_INSIGHTS_AGENT" description: "AI agent that answers business questions using structured data and customer feedback" title: "Business Insights Agent" - name: "revenue-analytics" type: "CORTEX_ANALYST_MESSAGE" identifier: "DASH_AUTOMATED_INTELLIGENCE_DB.SEMANTIC.BUSINESS_ANALYTICS_SEMANTIC" description: "Text-to-SQL for revenue, orders, customers, and product metrics" title: "Revenue Analytics" - name: "customer-feedback-search" type: "CORTEX_SEARCH_SERVICE_QUERY" identifier: "DASH_AUTOMATED_INTELLIGENCE_DB.RAW.CUSTOMER_FEEDBACK_SEARCH" description: "Search across product reviews and support tickets" title: "Customer Feedback Search" $$;
Connect from CoCo
cortex mcp add business-insights https://<account_url>/api/v2/databases/DASH_AUTOMATED_INTELLIGENCE_DB/schemas/SEMANTIC/mcp-servers/BUSINESS-INSIGHTS-MCP --type http
Now any MCP-compatible client (CoCo, Claude Desktop, custom apps) can discover and call these tools via the standard MCP protocol.
Optional: Iceberg V3 Features
Note: This section is optional. It demonstrates Iceberg V3 capabilities (deletion vectors, default values) using CoCo-generated SQL. No other sections depend on it.
Note: CoCo may take a few attempts to generate correct SQL for Iceberg V3 features (these are newer APIs). If you see "error executing SQL," let CoCo retry — it will self-correct and the end result will work.
Create a Managed Iceberg Table
Prompt CoCo:
"Create a managed Iceberg table from RAW.ORDERS with clustering by year and month, then query it to show partition pruning"
CoCo creates the table with CATALOG='SNOWFLAKE' (no external volume needed) and demonstrates partition pruning on filtered queries.
Explore V3: Deletion Vectors
"Create an Iceberg V3 table from RAW.ORDERS (ICEBERG_VERSION=3) with merge-on-read enabled, insert 1000 rows, then update 10 of them to demonstrate deletion vectors"
CoCo creates a V3 table with ENABLE_ICEBERG_MERGE_ON_READ = TRUE, inserts data, then runs an UPDATE that uses deletion vectors instead of full file rewrites.
Explore V3: Default Values
"Add a new column 'priority' with default value 'STANDARD' to the V3 table and show that existing rows get the default without a backfill"
This demonstrates V3 schema evolution without rewriting data files.
Optional: Streaming Ingestion
Note: This section is optional. The
setup.sqlscript already loads all 50M orders directly. This section demonstrates how you would stream data in production using the Snowpipe Streaming Python SDK.
Generate RSA Key Pair
Generate keys for Snowpipe Streaming authentication:
# Generate private key (unencrypted PEM) openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out rsa_key.p8 -nocrypt # Generate public key openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub # Upload public key to your Snowflake user (replace <your-username>) snow sql -q "ALTER USER <your-username> SET RSA_PUBLIC_KEY='$(grep -v -- '-----' rsa_key.pub | tr -d '\n')'" -c hol # Verify snow sql -q "DESC USER <your-username>" -c hol | grep RSA_PUBLIC_KEY_FP
Stream Data
cd snowpipe-streaming-python pip install -r requirements.txt # Copy and configure profile cp profile.json.template profile.json
Edit profile.json and set your account, user, private_key (contents of rsa_key.p8), and role.
# Stream 10,000 orders python src/automated_intelligence_streaming.py 10000
Verify Data Landed
SELECT COUNT(*) FROM dash_automated_intelligence_db.staging.orders_staging; SELECT COUNT(*) FROM dash_automated_intelligence_db.staging.order_items_staging;
You should see 10,000 orders and ~50,000 order items in staging.
Merge into Production
Use CoCo to merge the streamed data:
"Switch to the Gen2 warehouse, check how many rows are in staging, then merge them into RAW and show me the results"
Conclusion And Resources
Congratulations! You've successfully built a complete AI-powered retail analytics platform on Snowflake — from streaming ingestion through conversational AI and MCP server exposure.
What You Learned
- How to stream real-time data with Snowpipe Streaming and merge with Gen2 Warehouses
- How to build incremental transformation pipelines with Dynamic Tables
- How to serve sub-second point lookups with Interactive Tables
- How to create and query Iceberg V3 tables with deletion vectors and default values
- How to monitor and fix data quality gaps with Data Metric Functions
- How to build dbt analytical models on Snowflake
- How to create custom CoCo skills for repeatable workflows
- How to build a Cortex Agent that routes across structured (Analyst) and unstructured (Search) data
- How to evaluate agent quality with ground-truth datasets
- How to implement transparent row-level security through AI agents
- How to expose AI capabilities as managed MCP servers
Related Resources
Documentation:
This content is provided as is, and is not maintained on an ongoing basis. It may be out of date with current Snowflake instances