Build and Evaluate RAG with LangChain and Snowflake

Snowflake for Developers GuidesBuild and Evaluate RAG with LangChain and Snowflake

Quickstart

Build and Evaluate RAG with LangChain and Snowflake

Model Development

Josh Reini

Overview

Retrieval-Augmented Generation (RAG) has become a cornerstone technique for enhancing Large Language Models (LLMs) with domain-specific knowledge. In this guide, you'll learn how to build a RAG application using the langchain-snowflake package with SnowflakeCortexSearchRetriever and ChatSnowflake. You'll then evaluate your RAG's performance using TruLens and Snowflake's AI Observability features.

By combining Snowflake's data platform capabilities with LangChain's flexible framework, you'll create a powerful RAG system that can answer questions based on your own data.

What You'll Learn

How to set up a Snowflake environment for RAG applications
How to create a retriever using Snowflake Cortex Search
How to build a complete RAG chain with LangChain and Snowflake
How to evaluate RAG performance using TruLens
How to analyze evaluation results in Snowflake

What You'll Build

A complete RAG application that can answer questions about sales conversations by retrieving relevant information from a knowledge base and generating contextually appropriate responses. You'll also implement evaluation metrics to measure the quality of your RAG system.

What You'll Need

Access to a Snowflake account
A Snowflake account with Cortex features enabled
Valid Snowflake credentials
Python 3.8+

Setup

Firstly, to follow along with this quickstart, you can click on build-and-evaluate-rag-with-langchain-and-snowflake.ipynb to download the Notebook from GitHub.

Environment Configuration

Before we can build our RAG application, we need to prepare our Snowflake environment with data and search capabilities. This involves:

Loading sample data: We'll use sales conversation transcripts as our knowledge base
Creating a Cortex Search Service: This will index our data and enable semantic search

Run the SQL commands in the setup.sql file in your Snowflake environment to:

Create the necessary database, schema, and tables
Load sample sales conversation data
Set up the Cortex Search Service with appropriate indexing configuration
Enable cross-region inference for access to Claude 4

Installing Required Packages

Notebooks come pre-installed with common Python libraries for data science and machine learning. For this guide, we'll need to install additional packages specific to our RAG implementation:

%pip install --quiet -U langchain-core langchain-snowflake trulens-core trulens-providers-cortex trulens-connectors-snowflake trulens-apps-langchain

IMPORTANT:

Make sure your Snowflake account has Cortex features enabled

You'll need appropriate permissions to create databases, schemas, and search services

Creating a Snowflake Session

Setting Up Environment Variables

To securely connect to Snowflake, we'll configure our credentials as environment variables. This approach keeps sensitive information out of your code and follows security best practices.

import os

os.environ["SNOWFLAKE_ACCOUNT"] = "your_account_identifier"
os.environ["SNOWFLAKE_USER"] = "your_username"
os.environ["SNOWFLAKE_PASSWORD"] = "your_password"
os.environ["SNOWFLAKE_WAREHOUSE"] = "SALES_INTELLIGENCE_WH"
os.environ["SNOWFLAKE_DATABASE"] = "SALES_INTELLIGENCE"
os.environ["SNOWFLAKE_SCHEMA"] = "DATA"

Creating a Snowflake Session

Now we'll create a session that will be used by both the retriever and LLM components:

from langchain_snowflake import create_session_from_env

session = create_session_from_env()

This session will handle authentication and connection management for all Snowflake operations in our RAG application.

Building the Retriever

Creating a Cortex Search Retriever

The retriever is responsible for finding relevant documents from our knowledge base. We'll use SnowflakeCortexSearchRetriever which leverages Snowflake's Cortex Search capabilities:

from langchain_snowflake import SnowflakeCortexSearchRetriever

# Replace with your actual Cortex Search service
CORTEX_SEARCH_SERVICE = "SALES_INTELLIGENCE.DATA.SALES_CONVERSATION_SEARCH"

retriever = SnowflakeCortexSearchRetriever(
    session=session,
    service_name=CORTEX_SEARCH_SERVICE,
    k=3,  # Number of documents to retrieve
    auto_format_for_rag=True,  # Automatic document formatting
    content_field="TRANSCRIPT_TEXT",  # Extract content from this metadata field
    join_separator="\n\n",  # Separator used for multiple documents
    fallback_to_page_content=True  # Fall back to page_content if metadata field is empty
)

Testing the Retriever

Let's test our retriever to make sure it's working correctly:

import textwrap

docs = retriever.get_relevant_documents("What happened in our last sales conversation with DataDriven?")

for doc in docs:
    wrapped_text = textwrap.fill(doc.page_content, width=80)
    print(f"{wrapped_text}\n{'-' * 120}")

The retriever should return the most relevant documents from our sales conversations that match the query about DataDriven.

Building the RAG Chain

Creating the LLM Component

Now we'll set up the language model component using Snowflake's Cortex LLM capabilities:

from langchain_snowflake import ChatSnowflake

# Initialize chat model
llm = ChatSnowflake(
    session=session, 
    model="claude-4-sonnet", 
    temperature=0.1, 
    max_tokens=1000
)

Creating the RAG Prompt Template

We'll create a prompt template that instructs the model to answer questions based on the retrieved context:

from langchain_core.prompts import ChatPromptTemplate

# Create RAG prompt template
rag_prompt = ChatPromptTemplate.from_template("""
Answer the question based on the following context from Snowflake Cortex Search:

Context:
{context}

Question: {question}

Provide a comprehensive answer based on the retrieved context. If the context doesn't contain enough information, say so clearly.
""")

Building the Complete RAG Chain

Now we'll chain everything together using LangChain's LCEL (LangChain Expression Language):

from langchain_core.runnables import RunnablePassthrough

# Build RAG chain
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()} | rag_prompt | llm
)

Testing the RAG Chain

Let's test our RAG chain with a sample question:

response = rag_chain.invoke("What happened in our last sales conversation with DataDriven?")
print(f"{response.content}")

The RAG chain should retrieve relevant context about DataDriven from our knowledge base and generate a comprehensive answer based on that context.

Evaluating RAG Performance

Setting Up TruLens with Snowflake

To evaluate our RAG application, we'll use TruLens with Snowflake integration:

from trulens.apps.langchain import TruChain
from trulens.connectors.snowflake import SnowflakeConnector

tru_snowflake_connector = SnowflakeConnector(snowpark_session=session)

app_name = "sales_assistance_rag"
app_version = "cortex_search"

tru_rag = TruChain(
    rag_chain,
    app_name=app_name,
    app_version=app_version,
    connector=tru_snowflake_connector,
    main_method_name="invoke"
)

Creating an Evaluation Run

We'll create a run configuration and dataset for our evaluation:

import pandas as pd
from trulens.core.run import Run, RunConfig
from datetime import datetime

# Create a dataset of test queries
queries = [
    "What happened in our last sales conversation with DataDriven?",
    "What is the status of the deal with DataDriven?",
    "What is the status of the deal with HealthTech?"
]

queries_df = pd.DataFrame(queries, columns=["query"])

# Create a run configuration
run_name = f"experiment_1_{datetime.now().strftime('%Y%m%d%H%M%S')}"

run_config = RunConfig(
    run_name=run_name,
    dataset_name="sales_queries",
    source_type="DATAFRAME",
    dataset_spec={
        "input": "query",
    },
)

# Create and start the run
run: Run = tru_rag.add_run(run_config=run_config)
run.start(input_df=queries_df)

Computing Evaluation Metrics

We'll compute several key metrics to evaluate our RAG system:

import time

# Wait for all invocations to complete
while run.get_status() != "INVOCATION_COMPLETED":
    time.sleep(3)

# Compute metrics
run.compute_metrics([
    "answer_relevance",
    "context_relevance",
    "groundedness",
])

These metrics will help us understand:

How well the answer addresses the user query (answer_relevance)
How relevant the retrieved context is to the query (context_relevance)
How well the answer is supported by the retrieved context (groundedness)

Viewing Evaluation Results

You can view the evaluation results in Snowflake's AI Observability UI: Open in Snowflake AI Observability

Navigate to the Snowflake AI Observability page
Filter by your app_name, app_version, and run_name
Inspect individual invocations and their metrics
Identify areas for improvement in your RAG system

Conclusion And Resources

Congratulations! You've successfully built and evaluated a complete RAG application using LangChain and Snowflake. You've learned how to create a retriever using Snowflake Cortex Search, build a RAG chain with LangChain, and evaluate its performance using TruLens and Snowflake's AI Observability features.

This foundation can be extended in numerous ways, such as experimenting with different LLM models, adjusting retrieval parameters, adding conversation memory, or building a user interface with Streamlit.

What You Learned

Created a Snowflake environment for RAG applications
Built a retriever using Snowflake Cortex Search
Constructed a complete RAG chain with LangChain
Evaluated RAG performance using TruLens
Analyzed evaluation results in Snowflake

Related Resources

Documentation:

Happy building with LangChain and Snowflake!

Updated 2025-12-20

This content is provided as is, and is not maintained on an ongoing basis. It may be out of date with current Snowflake instances