Core Platform

Mining Query History to Build Better Semantic Models for AI

As we explored in You Need a Semantic Model, giving an AI Analyst raw database schemas and a handful of sample queries works great for a demo, but it inevitably falls apart in production. Without an explicit, governed dictionary (that is, a semantic model), AI agents are forced to guess how to join tables or calculate complex metrics such as "Net Retention Rate," leading to metric drift and boardroom confusion.

Building such a semantic model manually used to take weeks of interviews and YAML authoring. But as we recently shared in the launch of Semantic View Autopilot, that is no longer the case. You can now generate these models using patterns derived from your organization's existing query logic.

Today, we want to look under the hood at exactly how we automate this process to make your AI data assistants smarter: by mining your organization's SQL query history and discovering what you can do to get the best suggestions from it.

The engine: Turning query history into semantic definitions

The smartest AI analysts don't just guess what your data means; they learn it from the humans who query it every day. However, you might have millions of queries running every day on your account, but only a fraction are relevant to any given analytical case or semantic model. To efficiently harness data at Snowflake scale, we had to build a completely new system for processing query history. The system allows us to scan and recall relevant historical queries, identify patterns and flag queries and patterns with real analytical intent, extracting business logic and organizing it into the following semantic model concepts:

  • Verified Queries: Golden examples of how specific business questions map to the correct SQL, usually a few per domain, covering the most common analytical intents. For example, "What was gross churn ARR by region last quarter?" → A reviewed SQL query over accounts, subscriptions and churn_events.
  • Metrics and Facts: Standardized calculations for complex concepts, extracted wherever multiple analysts converge on the same formula. For example, Net Expansion MRR → SUM(expansion_mrr) - SUM(contraction_mrr) - SUM(churned_mrr).
  • Filters: Common boundary conditions used across multiple queries. For example, Active Subscriptions → subscription_status = 'active' AND canceled_at IS NULL.
  • Relationships: The implicit join paths that human analysts naturally use, validating how different tables connect without duplicating counts.

Now we will take an in-depth look at how we achieve that.

Step 1: Indexing user queries at scale

Scanning all historical SQL queries every time someone edits a semantic model would be hopeless at Snowflake scale. Instead, we built an inverted index over the query history, keyed by the tables each query touches. To build the index, we ingest each SQL query coming from sources that are most likely to contain analytical queries, such as Snowsight worksheet queries, dashboards, Tableau, Power BI and others. The index stores billions of executed queries from the last 90 days across all Snowflake deployments. Given the tables in your semantic model, the index narrows the search from hundreds of millions of raw executions down to the hundreds or thousands of queries that are actually relevant, all within seconds. After the initial retrieval step, the candidate set is further filtered to queries that stay within the tables and columns represented in the semantic model.

For example, consider a semantic model built only on two tables and selected columns: accounts(id, name, segment, region) and subscriptions(id, account_id, expansion_mrr, contraction_mrr, churned_mrr, subscription_status, canceled_at).

In the retrieval stage, queries like this won't be extracted, as they either use tables or columns not included in the semantic model:

SELECT a.segment, SUM(c.arr_impact)
FROM churn_events c
JOIN accounts a ON c.account_id = [a.id]

(This one uses the table churn_events, which is not part of the semantic model)

SELECT a.region,COUNT(*)
FROM subscriptions s
JOIN accounts a ON s.account_id = [a.id]
WHERE s.renewal_date >= DATEADD(month, -1, CURRENT_DATE)

(This query uses only modeled tables, but it depends on the column renewal_date, which is not part of the semantic model)

Only queries touching tables and columns explicitly included in the model will be kept. For instance:

SELECT a.segment, SUM(s.expansion_mrr)
FROM subscriptions s
JOIN accounts a ON s.account_id = [a.id]

Step 2: Separating the analytical signal from the noise

Despite focusing on query sources that are most likely to contain queries with analytical intent, not every query is worth learning from. A SELECT * preview, a one-off debugging query or an ad hoc row count tells us nothing about how your organization defines its business concepts. Depending on the type of suggestion (i.e., Verified Query, Filter, Metric, Fact or Relationship), we apply a slightly different analysis.

For Verified Query suggestions, we first normalize the queries and count how many times they were run where only the literal values differ, then we score them for their analytical depth, looking for aggregations, joins, window functions and complex predicates. What remains is the part of the query history that encodes durable business reasoning rather than incidental usage.

For example, a query like this is weak:

SELECT * FROM subscriptions LIMIT 10;

(It touches a modeled table, but it does not encode reusable business logic)

A query like this is much stronger:

SELECT
  a.segment,
  SUM(s.expansion_mrr) - SUM(s.contraction_mrr) - SUM(s.churned_mrr) AS net_expansion_mrr
FROM subscriptions s
JOIN accounts a ON s.account_id = [a.id](https://a.id)
WHERE s.snapshot_date = DATE_TRUNC('month', CURRENT_DATE)
GROUP BY 1;

(It contains a meaningful metric, a business dimension and a clear join path between modeled entities. It will score high in our initial selection.)

For Metrics, Facts and Filters, the system goes one level deeper. Instead of treating each query as a single unit, it extracts the reusable expressions inside each query, like aggregations, derived fields and business conditions, and then normalizes them so equivalent logic can be grouped together even when written with different formatting or with interchangeable elements in a different order.

For example, given this raw SQL:

SELECT
  a.region,
  SUM(invoice_amount) AS total_invoice_amount,
  AVG(DATEDIFF(day, invoice_date, paid_date)) AS avg_days_to_payment
FROM invoices i
JOIN accounts a ON i.account_id = [a.id](https://a.id)
WHERE payment_status = 'PAID'
  AND invoice_date >= DATEADD(month, -6, CURRENT_DATE)
GROUP BY 1;

Here is what gets extracted:

  • potential metrics: SUM(invoice_amount), AVG(DATEDIFF(day, invoice_date, paid_date))
  • potential facts: DATEDIFF(day, invoice_date, paid_date)
  • potential filters: payment_status = 'PAID'

For Relationships, rather than extracting them as standalone expressions, the system looks at how tables are joined across many queries, then combines those usage patterns with structural signals from the data model.

Step 3: Selecting and validating candidates

Having a pile of candidate queries and expressions isn't the same as having a good set of suggestions. The system still has to pick the ones worth surfacing.

For Verified Queries, the system selects SQL queries taking into account their frequency, analytical complexity and similarity to other queries and to those already present in the semantic model. The goal is to suggest a set that covers a broad range of new business intents, rather than focusing on the single most popular topic or duplicating what is already covered by queries added to the model.

For Metrics, Facts and Filters, the system ranks candidates based on a combination of how many queries use a given expression and the analytical value of that expression. A predicate that appears in 500 queries wins over one that appears in five, but a trivial filter like id IS NOT NULL won't dominate on frequency alone.

For Relationships, the system combines how often each join path appears in the history with table-level signals like primary keys and column cardinality, promoting only the relationships that are both commonly used in practice and structurally valid.

Once candidates are selected, the system validates each one against the semantic model. Expressions are checked to ensure they parse correctly, reference only columns that exist in the model and don't duplicate something that's already defined.

Step 4: Translating SQL into business language

The final step is turning selected SQL fragments or queries into suggestions that a human can actually review. Raw SQL is not enough. A recurring expression like SUM(expansion_mrr) - SUM(contraction_mrr) - SUM(churned_mrr) should not be surfaced to a reviewer as just another fragment of SQL. It should be proposed with a name like NET_EXPANSION_MRR and a description like, "the net change in monthly recurring revenue from existing customers, excluding new business." That translation from SQL logic into business language is what makes the suggestion reviewable and ultimately useful in a semantic model.

For every extracted element, the system uses LLMs to generate natural-language artifacts, such as a human-readable name and description (Filters, Metrics, Facts) or a question (Verified Queries), grounded in the semantic model's context and the patterns of usage that led to the suggestion in the first place.

By the end of this pipeline, hundreds of millions of raw query executions have been distilled into a focused, reviewable set of suggestions — metrics with names and business descriptions, filters with context, facts with clear definitions, relationships grounded in both usage and structure, and verified queries paired with natural-language questions. What previously could take weeks can now be significantly reduced to a curation task in many cases. The system proposes; you review; and the semantic model takes shape.

How we evaluate and tune the pipeline

A pipeline like this is only useful if it produces suggestions that humans are actually willing to accept. To produce high-quality suggestions, we use an LLM-as-a-judge framework to score generated suggestions along dimensions such as whether the question is meaningful, whether the SQL is correct and whether the SQL is a good expression of the business intent. These criteria were tuned and calibrated against real, internal Snowflake data, with ground truth reviewed by Snowflake data scientists, so that the scores better correlate with whether a suggestion is actually accepted by human reviewers. The goal is to surface candidates that are likely to survive review and become part of a governed semantic model.

We also use this evaluation framework for hill climbing and ablation. The suggestion pipeline has many moving parts: prompt versions, analytical-depth thresholds, deduplication logic, selection heuristics, LLM-based validity filtering and ranking strategies. Rather than blindly changing these, we run controlled experiments to measure the impact of each change. Some of that tuning happens offline through judge-based comparisons and ablation studies, and some happen online through A/B-style experimentation on configuration variants. This lets us improve the pipeline incrementally while keeping it aligned with real reviewer behavior.

Why do these suggestions matter?

Up to this point, we have been talking about how suggestions are produced. The next question is what happens once they are reviewed and accepted. At that point, they become part of the semantic model the AI Analyst uses to answer future questions.

Verified Queries, once they are added to the model, are indexed and retrieved at runtime as high-confidence examples of how a business question maps to SQL. When a user asks a question, the system first checks for an exact match against the existing Verified Queries. If it finds one, it can skip the expensive SQL generation path entirely and return the verified SQL directly. If there is no exact match, the system retrieves semantically similar Verified Queries from an embedding index and uses them as grounded examples during SQL generation for the system to reference.

Metrics, Facts and Filters become part of the semantic vocabulary that the model reasons over. Once added to the semantic model, they are serialized into the prompt as suggested calculations and reusable business concepts. Instead of regenerating or guessing logic for something like Net Expansion MRR, Customer Health Score or Recently Engaged Users on the fly, the system can rely on a reviewed definition already present in the model. This is especially valuable for concepts that are obvious to humans in a company but not obvious from the raw schema alone.

Relationships give the AI Analyst an explicit graph of how entities are allowed to connect. Once they are part of the semantic model, they are serialized as supported joins and used not only in initial SQL generation but also in correction and validation flows.

The important point is that these suggestions are not just metadata. Once reviewed and accepted, they change the runtime behavior of the system. Verified Queries shape retrieval and few-shot grounding. Metrics, Facts and Filters shape the business vocabulary available to the model. Relationships shape the legal join space. That is how query-history mining turns into better AI answers.

Maximizing the quality of query history-based suggestions

Now that you understand how the engine works and why these suggestions matter, let's look at a few practical steps you can take to help ensure your semantic model gets the best possible input. Since all the suggestions are derived from your query history, to help the system generate the highest quality Metrics, Facts, Filters and Verified Queries (VQs), keep these best practices in mind:

  • Query inside Snowsight and analytical tools: While we draw from multiple historical sources, queries executed directly within Snowsight, from Tableau or Power BI currently carry the most weight.
  • Align the semantic model with query history: To process a query into a suggestion, the query must exclusively use tables and columns already defined in your semantic model. To increase the number of suggestions, consider adding tables and columns that are frequently used alongside those already included in your model.
  • Leverage broad roles: Suggestions are bound by the data access of the user for whom the suggestions are generated. Users with the GOVERNANCE_VIEWER role have visibility across the entire account, generating comprehensive suggestions that cover more aspects of the semantic model.
  • Define semantic models on frequently used tables or views: We cannot generate suggestions if there is no query history related to the tables and views within the semantic model. If you want to use query history-based suggestions, avoid creating brand-new views or tables specifically for the semantic model, as they will have no historical usage.

Curating the output: Reviewing and refining

Once the system proposes candidates, the critical step is human review. When reviewing and accepting suggestions, pay attention to:

  • The intent and phrasing of the questions or descriptions: Does the suggested question for Verified Query or description of Filter, Metric or Fact match the intent of the SQL expression? Are they precise and phrased the way your organization actually talks about this data? This is the moment to catch any discrepancies between AI understanding and your business definition.
  • Missed context: The system may not be able to capture unwritten tribal knowledge. Use the review phase to add details that are potentially important for an AI to know. For example, explicitly note if a column contains city names, highlight specific event tags or add synonyms if users commonly refer to a metric by an acronym or misspell it in conversational queries.
  • Universal vs. ad hoc logic: Ensure the extracted Filters, Metrics and Facts represent a true business rule rather than a one-off, ad hoc constraint. If a suggested filter is hyperspecific, you may want to reject it to keep your model focused on universal definitions.

Continuous improvement: Evolving at the speed of your business

A static semantic model is a recipe for eventual metric drift. Businesses are living organisms; product lines shift; new subscription tiers are introduced; and regional boundaries change. Your AI Analyst needs to understand these changes the moment they happen.

Because our suggestion engine is tethered directly to your active query history, your semantic model evolves alongside your business. As analysts and business users begin querying new data patterns, the system can detect these shifts as new query patterns emerge. It's worth checking new suggestions for your existing model to see if new concepts and Verified Queries can be added.

Subscribe to our blog newsletter

Get the best, coolest and latest delivered to your inbox each week

Where Data Does More