Gen AI

Arctic Agentic RAG Episode 1: Agentic Query Clarification for Grounded and Speedy Responses

Welcome to the first deep-dive episode of our Arctic Agentic RAG series!

In our overview blog post, we explored the limitations of traditional retrieval-augmented generation (RAG) systems in enterprise environments. These systems struggle with ambiguous queries, multimodal data and complex reasoning — making it difficult for businesses to extract the right information reliably. This is where Arctic Agentic RAG comes in, transforming enterprise AI with intelligent, adaptive and verifiable retrieval.

Now, in this first deep-dive episode, we dive into one of the most fundamental challenges: handling ambiguous queries.

The first challenge: Addressing ambiguous queries

In traditional search, ambiguous queries are typically addressed by presenting users with a diverse set of related links, allowing them to explore different perspectives and manually extract the most relevant information. While this approach offers flexibility, it shifts the responsibility onto the user and lacks the precision required for enterprise AI applications, where speed, trust and accuracy are paramount.

Unlike traditional search, which relies on users to refine their own queries, standard RAG pipelines sometimes attempt to generate direct answers. However, these answers may be incorrect, incomplete or misleading if the query lacks sufficient specificity. Other times, they provide possible clarification questions. However, these clarification questions can often be irrelevant or unanswerable based on the user's repository, leading to further confusion and inefficiency. This is especially problematic in enterprises where accuracy and trust are paramount.

Arctic Agentic RAG takes a different approach: It clarifies the query first, ensuring that any clarification questions generated are both relevant and answerable within the user's repository, ensuring both questions and answers are grounded — meaning that responses can be verified against retrieved passages. Besides enhancing accuracy, our approach also ensures that responses are generated efficiently and cost effectively.

In this episode, we dive deep into how Arctic Agentic RAG tackles query ambiguity with grounded and speedy responses. We’ll also provide an overview of the Arctic Agentic RAG open source framework, which enables researchers and developers to explore and implement these techniques themselves.

Why ambiguous queries matter in enterprise RAG

RAG is designed to complement LLMs by retrieving evolving, domain-specific enterprise information from the corpus that is possibly absent from an LLM’s static training data. For RAG to be effective, it must ensure both diversity in covering diverse user intents and grounding responses to the retrieved passages.

Current methods for clarification tend to prioritize diversity, often leading to interpretations that RAG cannot effectively answer.

Pitfalls of existing methods  

Most state-of-the-art approaches follow a "diversify then verify" (DtV) strategy. An LLM first generates multiple possible meanings for a query, then retrieves documents for all interpretations and finally prunes irrelevant results.

For example, the query "What is HP?" could refer to Hewlett-Packard, horsepower or Harry Potter. A general LLM might suggest all three, but in an enterprise-specific corpus, only one may be relevant. Despite this, DtV retrieves documents for all interpretations, adding noise and increasing computational cost.

Figure 1a illustrates this limitation: Verification happens too late, after retrieval has already been influenced by irrelevant interpretations. This inefficiency makes enterprise retrieval less precise and more resource intensive.

Comparison of (a) DtV and (b) Snowflake’s VD workflows for handling ambiguous questions in RAG. VD avoids generating ungrounded interpretations and thus does not attempt to answer those.
Figure 1: Comparison of (a) DtV and (b) Snowflake’s VerDICT workflows for handling ambiguous questions in RAG. VerDICT avoids generating ungrounded interpretations and thus does not attempt to answer those.

How Arctic Agentic RAG addresses query ambiguity: Verified DIversification with ConsolidaTion (VerDICT)

Figure 1b contrasts Snowflake’s improved workflow, Verified DIversification with ConsolidaTion (VerDICT), integrating verification directly into the diversification step. Rather than generating all possible interpretations up front, our approach first relaxes user queries to retrieve passages with diverse interpretations, represented as relevance feedback in the figure. We then extract grounded interpretations from the retrieved passages and also ensure that each can be answered from the retrieved passages, using answerability feedback. These two feedback types are elaborated below.  

  • Retriever: relevance feedback: Unlike DtV, which diversifies into all possible interpretations, our approach first checks which interpretations are supported by the retrieved passages. A single retriever call with a relaxed query identifies top-k search results representing diverse interpretations, from which interpretations are extracted to avoid extracting an ungrounded interpretation such as Harry Potter in Figure 1a.

  • Generator: answerability feedback: Even if a document is relevant to the interpretation grounded to this document, it may not answer the query. Thus retrieval alone is insufficient for feedback — we introduce a generator feedback, to ensure that an answer can be generated before retraining an interpretation. To motivate, Figure 1b describes how we filter out relevant but unanswerable passage p2. It is relevant to the Hewlett-Packard interpretation, describing its products, but it cannot answer what HP is. To address this, we prompt the generator LLM with the question and its grounded passage, verifying whether a valid question-answer pair can be formed. If not, the interpretation is discarded.

A consolidation phase using clustering then follows to enhance robustness against noise in retriever and generator feedback: Question-answer pairs obtained from verification are clustered to keep those consistently supported by relevant passages, while filtering out outliers from noisy passages.

Performance Results

VerDICT is fast and resource efficient

By verifying interpretations up front, VerDICT eliminates noise and reduces unnecessary computations. Traditional DtV methods repeatedly call the retriever per each interpretation, while VerDICT significantly reduces retrieval overhead. For example, when there are three interpretations — Hewlett-Packard, horsepower or Harry Potter — the number of interpretations is |Q| = 3 and top-k results are retrieved per each interpretation. VerDICT, on the other hand, by verifying grounded interpretations up front, cuts down on retrieval and processing costs, as Table 1 shows.

 

 

Retriever 

LLM 

 

# of calls

# of calls × Input length

DtV

O(|Q|)

O(|Q|) × O(k)

VerDICT

O(1)

O(k) × O(1)

Table 1: Comparison of number of calls made to the retriever and LLM generator per each question between DtV and VerDICT. |Q| is the number of interpretations, and k is the size of retrieval.

VerDICT generates correct and grounded interpretations

Efficiency alone isn’t enough — accuracy is critical. In our evaluations (see Figure 2), 93% of VerDICT-generated interpretations led to correct and grounded answers, compared to just 56% with DtV. Even human-generated interpretations scored only 65%, proving that VerDICT is both accurate and reliable.

Figure 2: The ratio of grounded interpretations from DtV (orange) and VerDICT (blue), with Llama 3.3 70B (left) and GPT-4o (right) as backbone LLMs.
Figure 2: The ratio of grounded interpretations from DtV (orange) and VerDICT (blue), with Llama 3.3 70B (left) and GPT-4o (right) as backbone LLMs.

Summing up, these results demonstrate that VerDICT enhances accuracy, minimizes wasted resources and improves the user experience. For a deeper technical and empirical analysis, check out our paper “Agentic Verification for Ambiguous Query Disambiguation,” available at arXiv.

Applications in Snowflake’s Cortex Agents API

Arctic Agentic RAG is integrated into the Snowflake Cortex Agents API, providing Snowflake customers with a more intelligent, efficient and precise retrieval experience. This integration enhances enterprise search, knowledge management and automated analytics workflows. When a user provides a query that is vague and ambiguous in nature, instead of providing a single answer, Cortex Agents API follows up with related queries with VerDICT. This is anchored on:

  • Accurate retrieval: Clarifies queries dynamically for precise, context-aware responses through Snowflake Cortex Search with a proven record of search quality compared to competitors.

  • Optimized efficiency: Reduces computational overhead, speeding up analytics workflows.

  • Enterprise-grade applications: Supports customer support, compliance and R&D with domain-specific, verifiable insights of diverse nature.

This feature enhances various enterprise scenarios, such as:

  • Customer support automation: Clarifies vague queries such as "issue with my account" into actionable support topics.

  • Financial and legal compliance: Directs compliance officers to precise policy sections for regulations such as GDPR.

  • Internal knowledge management: Helps employees find specific HR and IT policies from vague search terms such as “work from home.”

  • Ecommerce analytics: Refines broad queries into segmented insights on sales, trends and customer behavior.

  • Healthcare and pharma research: Guides medical professionals to precise drug interactions and treatment protocols.

Figure 3 illustrates how our work helps users clarify ambiguous queries by generating relevant follow-up questions and providing grounded answers.

Query clarification in Snowflake’s Cortex Agents API setup with tool access to a series of synthetically generated insurance documents retrieved via Cortex Search services. The system refines vague queries by suggesting specific related questions, improving retrieval accuracy and user experience.
Figure 3: Query clarification in Snowflake’s Cortex Agents API setup with tool access to a series of synthetically generated insurance documents retrieved via Cortex Search services. The system refines vague queries by suggesting specific related questions, improving retrieval accuracy and user experience.

Open sourcing Arctic Agentic RAG

Beyond Snowflake’s own offerings, we have an open source Arctic Agentic RAG framework for researchers and practitioners. Unlike other agentic RAG frameworks that aim to be feature-complete, our Arctic Agentic RAG framework prioritizes lightweight, efficient components for fast development and research exploration. The key components include:

  • LLM backend: Supports cloud providers such as Snowflake Cortex Completion and Azure OpenAI, as well as local inference via vLLM.

  • Template format: Standardizes input-output structures for defining agentic factories.

  • Agentic factory: Defines agent functionality, including input parsing, retrieval and response generation.

These modular components allow researchers to easily build and customize functional agents while maintaining efficiency. 

For this initial release, we provide all components related to VerDICT for handling ambiguous queries, including retrieval, answering and clustering modules for disambiguation. Additionally, we offer easy-to-use examples for building a simple RAG framework using Cortex Search and Completion functionalities, streamlining retrieval and large model deployment.

As we continue developing Arctic Agentic RAG, we plan to open source more features from our RAG innovations, helping the community reproduce research results and accelerate the adoption of advanced RAG techniques. By leveraging our framework, researchers and practitioners can rapidly prototype and iterate on novel ideas without the overhead of a full-fledged implementation. This fosters innovation and promotes shared advancements to benefit the broader AI community.

Explore Arctic Agentic RAG to learn more, and start experimenting with it today!

 


This concludes Episode 1 of our Arctic Agentic RAG series. Stay tuned for Episode 2, where we tackle the next major challenge: handling multimodal enterprise data — bringing images, tables, structured databases and text together in a seamless, intelligent retrieval process. Interested in this discussion? Visit the AI Research & Development Community forum on Snowflake.

Snowflake contributors: Zhewei Yao, Danmei Xu, Moutasem Akkad, Yuxiong He

Our great collaborators: We would like to extend our gratitude to our academic collaborators — Seung-won Hwang and Youngwon Lee from Seoul National University, and Feng Yan and Ruofan Wu from the University of Houston — for their valuable contributions.

Share Article

Subscribe to our blog newsletter

Get the best, coolest and latest delivered to your inbox each week

Where Data Does More

  • 30-day free trial
  • No credit card required
  • Cancel anytime