Core Platform

How a Semantic View Should Be Built

Semantic views are structured definitions that describe a database's tables, columns, relationships and business logic so that AI systems can answer natural language questions accurately. They're the layer between raw data and useful answers.

Building one is straightforward in theory. In practice, most semantic views fail for the same few reasons: definitions that don't match how the business actually operates, incomplete coverage of the concepts people ask about and silent drift as the underlying data evolves. These aren't model problems. They're input problems.

Over the course of building Semantic View Autopilot (SVA), we arrived at a set of principles for what makes a semantic view good — and designed the product around them.

Definitions should come from organizational behavior, not individual knowledge

The most common way to build a semantic view is to assign someone — usually a data engineer or analytics lead — to define every metric, filter and relationship by hand. The problem is that this person is making hundreds of judgment calls based on their own understanding, and the result reflects a single perspective.

Business definitions are inherently collective. "Revenue" means what the organization has agreed it means through years of dashboarding, reporting and ad hoc analysis. That agreement is already encoded — in Tableau calculated fields, in recurring query patterns, in dbt model logic. It just isn't written down in one place.

How we approach this in SVA: The system ingests from existing data artifacts rather than starting from a blank file. Tableau workbooks, dbt projects, Power BI models, manual SQL and sources from Open Semantic Interchange partners all serve as input. Each is treated as evidence of how the business defines its concepts. The semantic view is assembled from this evidence rather than authored from scratch.

Coverage should reflect real usage, not assumed importance

A hand-authored semantic view tends to cover what the author thinks is important. But the questions people actually ask don't always align with what someone anticipated. Gaps show up in production — a user asks about "net new customers" and the semantic view has no definition for it, even though analysts have been calculating it in SQL for months.

Good coverage means the semantic view includes the concepts that people actually use, weighted by how broadly and consistently those concepts appear across the organization.

How we approach this in SVA: Query history is one of the primary inputs. SVA clusters queries by the tables, columns, filters and aggregations they use, and looks for convergence. When multiple analysts across different teams independently calculate something the same way, that concept is a candidate for inclusion. This catches definitions that matter in practice but wouldn't have occurred to any single author.

Noisy signals require consensus, not just frequency

Query history is rich but messy. Any individual query might be exploratory, wrong or testing something. The most frequently run query on a table might be a health check, not a business question. Frequency alone is a poor proxy for correctness.

What matters is agreement across independent sources. When the same join pattern, filter condition or aggregation appears across different analysts, different time periods and different tools, that convergence is a meaningful signal. It represents how the organization has collectively settled on a definition, even if nobody formalized it.

How we approach this in SVA: The system uses clustering to identify convergent patterns rather than simply counting occurrences. It also weights signals by source reliability — a curated Tableau workbook ranks higher than a one-off ad hoc query. When multiple source types agree (for example, query history confirms the join pattern that a Tableau workbook uses), confidence increases. When they disagree, the conflict is flagged.

Confidence should be explicit, not hidden

Not every definition in a semantic view is equally well-supported. Some relationships are obvious — a primary key join confirmed by thousands of queries. Others are ambiguous — two columns that might be related but only appear together in a handful of exploratory queries.

A good semantic view makes this distinction visible. Treating high-confidence and low-confidence elements identically means either the low-confidence items are silently wrong, or the high-confidence items are unnecessarily held up for review.

How we approach this in SVA: The system splits its output by confidence level. High-confidence elements — where the evidence is unambiguous across multiple signals — go directly into the semantic view. Ambiguous elements go into a suggestions panel, each accompanied by the evidence that produced them: which queries support the suggestion, what the cardinality analysis showed, where sources disagreed. Reviewers evaluate proposals with data attached, not bare yes/no prompts.

This means the human work is reviewing and curating rather than authoring. A subject matter expert who recognizes correct definitions when they see them — but wouldn't write the YAML themselves — can still produce a high-quality semantic view.

A semantic view should encode understanding, not just examples

Verified queries — trusted SQL paired with natural language questions — are the highest-value component of a semantic view. They teach the system exactly how to answer specific questions. But if the system only learns to match those exact questions, the value plateaus quickly. Every new question type requires a new verified query.

The better outcome is for the knowledge contained in verified queries to transfer into the semantic view's structure — as metric definitions, filters and instructions — so the system can handle variations it hasn't seen before.

How we approach this in SVA: The system runs an optimization step that takes each verified query, temporarily removes it, and tests whether the model can still answer correctly. Where it can't, the system identifies what concept is missing and adds it to the semantic view as a reusable definition. A verified query about "active customers last month" produces a general definition of "active customer" that works for any time range, region or segment — not just the original question.

Semantic views should stay current without manual maintenance

Business logic changes. New product tiers launch, team structures shift, reporting definitions get updated. A semantic view that was accurate six months ago can quietly drift out of alignment if nobody is maintaining it.

Good semantic views evolve with the organization. Changes in query patterns, new BI artifacts and shifts in usage should surface as proposed updates rather than accumulating as silent gaps.

How we approach this in SVA: The system monitors usage patterns and query history on an ongoing basis. When new patterns emerge — a new filter value appearing consistently across queries, a join path that's become more common — SVA proposes updates to the semantic view through the same suggestions mechanism used during creation. The same confidence split applies: Clear changes get proposed for direct application; ambiguous ones get surfaced for review.


These principles aren't specific to Snowflake or to SVA. Any approach to building a semantic view benefits from grounding definitions in real usage, being explicit about confidence and designing for generalization over memorization. SVA is our implementation of these ideas — a system that treats semantic view creation as evidence synthesis rather than manual authoring.

Subscribe to our blog newsletter

Get the best, coolest and latest delivered to your inbox each week

Where Data Does More