Self-Improving Agents with CoCo

Snowflake for Developers/Guides/Self-Improving Agents with CoCo

Quickstart

Self-Improving Agents with CoCo

Snowflake CoCo

Josh Reini, Elliott Botwick

Overview

Building AI agents is just the beginning — understanding how well they perform and systematically improving them is what separates prototypes from production systems. In this guide, you'll build a marketing analytics agent, deploy it to production, stress-test it with hard queries, then use Snowflake CoCo to mine failures from logs, evaluate with Agent GPA, and optimize the agent's instructions.

By the end, you'll have an agent with measurably better performance — and a repeatable workflow for continuous improvement.

Step	What You'll Do
Setup	Deploy a production agent with 5 tools
Stress Test	Run hard queries in Snowflake CoWork to generate failure traces
Install Snowflake CoCo	Install the CLI while traces propagate (~10 min)
Evaluate	Mine logs, curate an eval dataset, run Agent GPA baseline
Optimize	Analyze failures, generate improved instructions, validate with a second eval

Architecture

┌──────────────────────────────────────────────────────┐
│              MARKETING CAMPAIGNS AGENT               │
│                                                      │
│  Tool 1: query_performance_metrics (Cortex Analyst)  │
│  Tool 2: search_campaign_content   (Cortex Search)   │
│  Tool 3: generate_campaign_report  (Stored Proc)     │
│  Tool 4: web_search                (Web Search)      │
│  Tool 5: data_to_chart             (Visualization)   │
└──────────────────────────────────────────────────────┘
        │                                     ▲
        ▼                                     │
┌───────────────┐    ┌──────────────┐   ┌─────────────┐
│  Evaluate     │───▶│  Analyze     │──▶│  Optimize   │
│  (Agent GPA)  │    │  (failures)  │   │  (AI-driven)│
└───────────────┘    └──────────────┘   └─────────────┘

What You'll Learn

How to build a Cortex Agent with multiple tool types (Cortex Analyst, Cortex Search, stored procedures, web search, data-to-chart)
How to use Snowflake CoWork to interact with your agent and generate observability traces
How to use Snowflake CoCo to mine agent logs and curate evaluation datasets
How to run Agent GPA evaluations with built-in metrics
How to analyze failure patterns and generate improved orchestration instructions
How to validate improvements by comparing evaluation scores across agent iterations

What You'll Build

A complete agent optimization workflow:

A marketing campaigns agent with 5 tools
An evaluation dataset curated from real agent interaction logs
An optimized agent with improved orchestration instructions
Before/after evaluation results demonstrating measurable improvement

What You'll Need

A Snowflake account with ACCOUNTADMIN access
Cross-region inference enabled (required for evaluation LLM judge models)
~5 minutes for setup script to complete

Prerequisites

Basic familiarity with Snowflake SQL and Cortex Agents

Run Setup

Download the setup.sql file from the repository.

Open a Snowflake worksheet in Snowsight and run the entire setup.sql file. This creates:

Database SELF_IMPROVING_AGENT_DB with schema AGENTS
4 data tables (25 campaigns, ~1578 performance records, content, feedback)
Semantic view, Cortex Search service, report generation procedure
The production agent MARKETING_CAMPAIGNS_AGENT

Verify setup succeeded — the final statement should print a success banner.

Stress-Test the Agent in Snowflake CoWork

Open the agent in Snowflake CoWork:

In Snowsight select AI & ML > Agents
Select MARKETING_CAMPAIGNS_AGENT
Query your newly created agent in the Agent Admin View or Click Preview in Snowflake CoWork to get the full front end experience!
Or optionally - Go to ai.snowflake.com

Your goal is to generate a mix of successful and failing traces by asking progressively harder questions. Copy-paste these one at a time:

Note: If you'd prefer a quicker less interactive route you can run - agent_requests.sql. As the script executes (~3-5 minutes) visit the AI & ML > Agents > MARKETING_CAMPAIGNS_AGENT > Monitoring tab to inspect traces of all of the calls made to your agent!

Simple queries

What is the total spend across all campaigns?

Generate a report for our holiday gift guide

What content format generated the most revenue per dollar spent?

Compare the A/B test performance for our email vs social media campaigns

Multi-tool queries

Which campaign had the highest ROI and what did customers say about it? Generate a report for that campaign too.

Which audience segment responded best to our promotions and what was their average spend?

Complex synthesis queries

Which campaigns had the best A/B test lift but the worst customer sentiment?

Show me campaigns where customer sentiment was negative but ROI was still positive — what made them work financially?

Note: It may take a few minutes for agent interaction traces to appear in the observability logs. If you just finished running the queries above, now is a good time to install Snowflake CoCo (next step) while the traces propagate.

Install Snowflake CoCo

Snowflake CoCo is an AI-powered CLI that you'll use to mine agent logs, run evaluations, analyze failures, and generate improved agent instructions.

Linux / macOS / WSL:

curl -LsS https://ai.snowflake.com/static/cc-scripts/install.sh | sh

Windows (PowerShell):

irm https://ai.snowflake.com/static/cc-scripts/install.ps1 | iex

After installation, run cortex to launch the setup wizard — it will guide you through connecting to your Snowflake account.

For detailed setup instructions, see the Snowflake CoCo CLI docs.

Curate an Eval Dataset from Logs

Open Snowflake CoCo and enter /bypass to enable bypass mode, then enter the following prompt:

Use the cortex agent dataset-curation skill to pull all available production traces for
SELF_IMPROVING_AGENT_DB.AGENTS.MARKETING_CAMPAIGNS_AGENT and curate an evaluation dataset.
Ground truth should always include sections for key figures, curated suggestions and sources referenced
and be specific enough to accurately evaluate agent quality. Expected tool invocations are not needed.
Store the evalset in SELF_IMPROVING_AGENT_DB.AGENTS and register it as a new evaluation dataset.

Snowflake CoCo will:

Query the observability traces from the previous step
Help you select and annotate queries with ground truth
Create an eval table and register it via SYSTEM$CREATE_EVALUATION_DATASET

Run Baseline Evaluation

Run Agent GPA on your curated dataset. Enter this prompt in Snowflake CoCo:

Run an evaluation for SELF_IMPROVING_AGENT_DB.AGENTS.MARKETING_CAMPAIGNS_AGENT against the registered dataset.
Compute Answer Correctness, Logical Consistency, Execution Efficiency, Plan Quality and Plan Adherance as metrics.
All metrics should use a 0-1 scale where 1 is optimal beavior. Once the eval completes, show me the evaluation results.
Break down scores by metric and identify which queries scored lowest. What are the common failure patterns?

Common failure patterns you'll see:

Wrong tool selection for multi-tool queries
Redundant tool calls
Incomplete summaries missing key data

Improve the Agent

Based on the failure analysis, generate improved orchestration and response instructions for
SELF_IMPROVING_AGENT_DB.AGENTS.MARKETING_CAMPAIGNS_AGENT that fix the identified issues.
The instructions should tell the agent what format to respond in, when to use multiple tools and in what order,
and encourage efficient tool calling. Only make updates to the instructions - do not make any updates to
the tool configuration or other areas of the agent spec. Apply the changes.

Snowflake CoCo will:

Draft improved orchestration and response instructions with explicit tool routing rules and response guidelines
Apply via ALTER AGENT ... SET SPECIFICATION = ...

What changes: Only the instructions.orchestration and instructions.response field. Tools, tool_resources, and orchestration model stay the same. Improved instructions is the only lever.

Validate Agent Improvement

Run the evaluation of SELF_IMPROVING_AGENT_DB.AGENTS.MARKETING_CAMPAIGNS_AGENT against the same dataset again.
Compare the results against the baseline — show me a side-by-side comparison of scores by metric
and highlight what improved.

What to look for:

Overall score improvement: The optimized agent should score higher across both metrics
No regressions: The optimized agent should still handle simple queries just as well as the baseline

Conclusion and Resources

Congratulations! You've built a self-improving AI agent workflow — deploying a production agent, stress-testing it, mining failures from logs, evaluating with Agent GPA, and validating that improved instructions lead to measurably better performance.

What You Learned

How to build a multi-tool Cortex Agent with Cortex Analyst, Cortex Search, stored procedures, web search, and data-to-chart capabilities
How to generate observability traces by stress-testing your agent in Snowflake CoWork
How to use Snowflake CoCo to mine agent logs and curate evaluation datasets
How to run Agent GPA evaluations with built-in metrics
How to analyze failure patterns and generate improved orchestration instructions
How to validate improvements by comparing baseline vs optimized evaluation scores
That better instructions — not more tools — can be the key lever for agent improvement

Key Concepts

Concept	Description
Agent GPA	Evaluation framework with built-in metrics for answer correctness and logical consistency
Orchestration Instructions	Natural language instructions telling the agent how to route queries and coordinate tools — the key lever for improvement
Eval Dataset	Frozen snapshot of queries + ground truth used to score agent performance
Snowflake CoCo	AI-powered CLI that mines agent logs, runs evaluations, identifies failures, and generates improved agent instructions

Related Resources

Cleanup

USE ROLE ACCOUNTADMIN;
DROP DATABASE IF EXISTS SELF_IMPROVING_AGENT_DB;
DROP ROLE IF EXISTS SELF_IMPROVING_AGENT_ROLE;

Updated 2026-05-30

This content is provided as is, and is not maintained on an ongoing basis. It may be out of date with current Snowflake instances