Build a Multi-Tenant AI Chat App with Snowflake Cortex REST API
Overview
Through this guide, you will build a multi-tenant AI chat application that streams responses in real time using Snowflake's Cortex REST API. The system uses Key Pair JWT authentication and Snowflake-native Model RBAC to control which AI models each tenant can access — all enforced by SQL GRANT statements, not application code.
Each tenant is fully isolated at the Snowflake infrastructure level:
- Separate Snowflake user — dedicated service user per tenant (e.g.,
COCO_USER_ALPHA) - Separate Snowflake role — dedicated role with its own grants (e.g.,
COCO_TENANT_ALPHA) - Separate RSA key pair — each tenant authenticates with their own private key
- Separate model grants — Snowflake RBAC controls which AI models each tenant can call
- Separate rate limits — per-tenant request throttling at the gateway
- Separate API keys — gateway-level authentication per tenant
By the end, you'll have a working FastAPI gateway and Streamlit chat interface where two tenants (Alpha and Beta) each see only the models they're authorized to use, with unauthorized attempts blocked by Snowflake itself.

Prerequisites
- Basic familiarity with Python, REST APIs, and terminal commands
What You'll Learn
- Calling Snowflake Cortex REST API with SSE streaming for real-time AI chat
- Key Pair JWT authentication for multi-tenant access (no static passwords)
- Snowflake Model RBAC to control model access per tenant via SQL GRANT/REVOKE
- Building a streaming chat UI with Streamlit that handles multi-turn conversations
What You'll Need
- A Snowflake account with ACCOUNTADMIN access
- Python 3.11+ installed
- OpenSSL installed (for RSA key generation)
- A code editor (e.g., VS Code)
What You'll Build
- A FastAPI gateway with per-tenant auth, rate limiting, and SSE streaming
- A JWT token factory that generates short-lived tokens from RSA key pairs
- A Streamlit chat interface with tenant switching, model selection, and streaming responses
Environment Setup
Clone and Run Setup
Clone the repository and run the setup script, which creates a virtual environment, installs dependencies, and generates RSA key pairs for both tenants:
git clone https://github.com/Snowflake-Labs/sfquickstarts.git cd sfquickstarts/site/sfguides/src/build-multi-tenant-ai-chat-application-with-cortex-rest-api/assets bash setup.sh
Project Structure
The assets/ folder contains the complete project:
| Path | Description |
|---|---|
app/core/jwt_helper.py | JWT token factory — generates short-lived tokens from RSA key pairs |
app/core/config.py | Settings — reads .env via pydantic-settings |
app/core/tenants.py | Tenant registry — maps API keys to Snowflake users/roles |
app/api/v1/dependencies.py | Auth guard + per-tenant rate limiting |
app/api/v1/routes.py | API endpoints — streaming chat route |
app/models/schemas.py | Request/response Pydantic models |
app/services/cortex_client.py | Cortex REST API client — SSE streaming |
app/main.py | FastAPI application entry point |
streamlit_app/streamlit_app.py | Streamlit entry point |
streamlit_app/app_pages/chat.py | AI Chat page with SSE parsing |
01_rbac_setup.sql | Snowflake roles, users, and key registration |
02_enable_model_rbac.sql | Model RBAC grants per tenant |
03_bonus_table_agent_rbac.sql | Optional: table + agent RBAC for Cortex Agents |
Configure Environment Variables
Edit the .env file (created from .env.example by setup.sh):
SNOWFLAKE_ACCOUNT=YOUR_ACCOUNT_IDENTIFIER ALPHA_SNOWFLAKE_USER=COCO_USER_ALPHA ALPHA_PRIVATE_KEY_PATH=keys/alpha_rsa_key.p8 BETA_SNOWFLAKE_USER=COCO_USER_BETA BETA_PRIVATE_KEY_PATH=keys/beta_rsa_key.p8 COCO_PORT=8000 LOG_LEVEL=INFO
Replace
YOUR_ACCOUNT_IDENTIFIERwith your Snowflake account identifier (e.g.,MYORG-MYACCOUNT).
Accounts with underscores: If your account identifier contains underscores (e.g.,
MYORG-MY_ACCOUNT), the auto-generated Cortex URL will fail due to SSL hostname validation. Add this line using your account locator instead:CORTEX_BASE_URL_OVERRIDE=https://YOUR_ACCOUNT_LOCATOR.snowflakecomputing.com/api/v2/cortexFind your account locator by running
SELECT CURRENT_ACCOUNT();in Snowsight.
Snowflake RBAC Setup
Navigate to Projects → Workspaces in Snowsight and create a new SQL worksheet. Run the following scripts in order.
Step 1: Create Roles and Users
Run assets/01_rbac_setup.sql to create tenant roles, service users, and register RSA public keys:
USE ROLE ACCOUNTADMIN; -- Create tenant roles CREATE ROLE IF NOT EXISTS COCO_TENANT_ALPHA; CREATE ROLE IF NOT EXISTS COCO_TENANT_BETA; -- Grant Cortex access to both roles GRANT DATABASE ROLE SNOWFLAKE.CORTEX_USER TO ROLE COCO_TENANT_ALPHA; GRANT DATABASE ROLE SNOWFLAKE.CORTEX_USER TO ROLE COCO_TENANT_BETA; -- Create service users (TYPE=SERVICE means no interactive login) CREATE USER IF NOT EXISTS COCO_USER_ALPHA TYPE = SERVICE DEFAULT_ROLE = COCO_TENANT_ALPHA; CREATE USER IF NOT EXISTS COCO_USER_BETA TYPE = SERVICE DEFAULT_ROLE = COCO_TENANT_BETA; -- Assign roles GRANT ROLE COCO_TENANT_ALPHA TO USER COCO_USER_ALPHA; GRANT ROLE COCO_TENANT_BETA TO USER COCO_USER_BETA; -- Register RSA public keys (replace with your actual keys) ALTER USER COCO_USER_ALPHA SET RSA_PUBLIC_KEY='MIIBIjANBgkqhki...your-alpha-key...'; ALTER USER COCO_USER_BETA SET RSA_PUBLIC_KEY='MIIBIjANBgkqhki...your-beta-key...';
Tip: Extract the key content without the BEGIN/END lines:
awk 'NR>1 && !/END/' keys/alpha_rsa_key.pub | tr -d '\n'
Step 2: Enable Model RBAC
Run assets/02_enable_model_rbac.sql to enforce strict model access:
USE ROLE ACCOUNTADMIN; CALL SNOWFLAKE.MODELS.CORTEX_BASE_MODELS_REFRESH(); ALTER ACCOUNT SET CORTEX_MODELS_ALLOWLIST = 'None'; -- Alpha models GRANT APPLICATION ROLE SNOWFLAKE."CORTEX-MODEL-ROLE-CLAUDE-4-SONNET" TO ROLE COCO_TENANT_ALPHA; GRANT APPLICATION ROLE SNOWFLAKE."CORTEX-MODEL-ROLE-MISTRAL-LARGE2" TO ROLE COCO_TENANT_ALPHA; -- Beta models GRANT APPLICATION ROLE SNOWFLAKE."CORTEX-MODEL-ROLE-OPENAI-GPT-4.1" TO ROLE COCO_TENANT_BETA; GRANT APPLICATION ROLE SNOWFLAKE."CORTEX-MODEL-ROLE-LLAMA3.1-70B" TO ROLE COCO_TENANT_BETA; GRANT APPLICATION ROLE SNOWFLAKE."CORTEX-MODEL-ROLE-DEEPSEEK-R1" TO ROLE COCO_TENANT_BETA;
Model access is now controlled entirely by SQL. To add or remove a model for any tenant, run a single GRANT or REVOKE — no code changes or redeployment needed.
How the Gateway Works
The gateway is a FastAPI application that sits between the Streamlit UI and Snowflake Cortex. Here's how the key components work together.
JWT Token Factory (app/core/jwt_helper.py)
Each tenant authenticates with Snowflake using a short-lived JWT generated from their RSA private key. The factory:
- Reads the tenant's RSA private key from disk (once, then cached)
- Computes the public key fingerprint (SHA256 hash)
- Builds a JWT payload (issuer, subject, issued-at, expiry)
- Signs it with the private key using RS256
- Caches the token and auto-refreshes when < 5 minutes remain
payload = { "iss": f"{self.qualified_username}.{self._public_key_fp}", "sub": self.qualified_username, "iat": iat, "exp": exp, } token = jwt.encode(payload, self._private_key, algorithm="RS256")
Tenant Registry (app/core/tenants.py)
Maps API keys to Snowflake users and roles. Each tenant has:
| Field | Alpha | Beta |
|---|---|---|
| Snowflake User | COCO_USER_ALPHA | COCO_USER_BETA |
| Snowflake Role | COCO_TENANT_ALPHA | COCO_TENANT_BETA |
| Default Model | claude-4-sonnet | openai-gpt-4.1 |
| Rate Limit | 60 req/min | 30 req/min |
| API Key | sk-alpha-secret-key-001 | sk-beta-secret-key-001 |
Cortex REST API Client (app/services/cortex_client.py)
Calls the Snowflake Cortex OpenAI-compatible endpoint (/v1/chat/completions) and re-emits the response as structured SSE events:
| Event | Purpose | When |
|---|---|---|
meta | Request metadata (id, model, tenant) | First event |
delta | Content chunk (one per token) | During streaming |
done | Final stats (latency, token usage) | Last event |
error | Error details (status, message) | On failure |
url = f"{self.base_url}/v1/chat/completions" body = { "model": model, "messages": messages, "max_completion_tokens": request.max_tokens, "temperature": request.temperature, "stream": True, }
Auth Guard (app/api/v1/dependencies.py)
A FastAPI dependency that runs before every route:
- Authenticate — validates the
X-API-Keyheader against the tenant registry - Rate limit — enforces a per-tenant sliding window (60-second window)
- Inject tenant — returns the
Tenantobject to the route handler
How the Streamlit Chat Works
The Streamlit app (streamlit_app/app_pages/chat.py) provides a ChatGPT-like interface that connects to the FastAPI gateway.
Flow
- User picks a tenant (User Alpha or User Beta) in the sidebar — this selects the API key and available models
- User types a message in the chat input
- The full conversation history is sent to the gateway via
POST /v1/chat/stream - The gateway generates a per-tenant JWT and forwards to Cortex
- Cortex streams tokens back as SSE events
- Each
deltaevent appends text to the chat bubble in real time - The
▌cursor shows the response is still generating
with httpx.stream("POST", f"{GATEWAY_URL}/v1/chat/stream", json=payload, headers=headers, timeout=60.0) as response: for line in response.iter_lines(): if line.startswith("data:"): data = json.loads(line[len("data:"):].strip()) if current_event == "delta": full_response += data.get("content", "") placeholder.markdown(full_response + "▌")
Model access is Snowflake-native: The models list in the UI is a hint. Real enforcement happens via Model RBAC — if a model is listed but not granted to the tenant's role, Snowflake returns 403.
Run and Test
Start the Gateway
source venv/bin/activate python3 -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
Verify it's running:
curl http://localhost:8000/healthz # {"status":"ok"}
Start the Streamlit App
In a separate terminal:
source venv/bin/activate cd streamlit_app streamlit run streamlit_app.py --server.port 8501
Open http://localhost:8501 in your browser.
Test Streaming Chat via curl
curl -N -X POST http://localhost:8000/v1/chat/stream \ -H "Content-Type: application/json" \ -H "X-API-Key: sk-alpha-secret-key-001" \ -d '{"message": "What is Snowflake Cortex?", "model": "claude-4-sonnet"}'
You'll see SSE events streaming in real time:
event: meta data: {"id": "coco_req_abc123", "model": "claude-4-sonnet", "tenant_id": "tenant-alpha"} event: delta data: {"content": "Snowflake"} event: delta data: {"content": " Cortex"} ... event: done data: {"id": "coco_req_abc123", "latency_ms": 2340, "usage": {"prompt_tokens": 12, "completion_tokens": 89}}
Test RBAC Denial
Beta tries to use Claude (not granted):
curl -N -X POST http://localhost:8000/v1/chat/stream \ -H "Content-Type: application/json" \ -H "X-API-Key: sk-beta-secret-key-001" \ -d '{"message": "Hello", "model": "claude-4-sonnet"}'
Snowflake returns an SSE error event — model not authorized for this role.
Bonus: Extending to Cortex Agents
This guide focuses on the Cortex REST API for LLM chat. If you want to extend the same multi-tenant gateway to invoke Cortex Agents (which combine text-to-SQL, semantic search, and tool orchestration), the same RBAC pattern applies — grant each tenant role access to the agent and its underlying data.
A reference SQL script is included at assets/03_bonus_table_agent_rbac.sql showing the required grants:
| Grant Type | What It Enables |
|---|---|
USAGE ON WAREHOUSE | Running queries |
USAGE ON DATABASE/SCHEMA | Accessing the data layer |
SELECT ON TABLE | Reading specific tables |
USAGE ON CORTEX AGENT | Invoking Cortex Agents |
USAGE ON CORTEX SEARCH SERVICE | Semantic search |
Tip: To restrict an agent to only certain tenants, simply omit the
GRANTfor that role. Snowflake will return a permission error — no application code changes needed.
To learn more about building Cortex Agents, see:
Conclusion And Resources
Congratulations! You've built a multi-tenant AI chat application with:
- Real-time SSE streaming — token-by-token responses from Snowflake Cortex REST API
- Key Pair JWT auth — short-lived, auto-rotating tokens per tenant with no static passwords
- Snowflake Model RBAC — model access controlled by SQL
GRANT/REVOKE, enforced by Snowflake itself - Multi-turn conversation — full conversation history sent to Cortex for contextual replies
- Per-tenant rate limiting — sliding window enforcement per API key
What You Learned
- Calling Snowflake Cortex REST API with SSE streaming for real-time AI chat
- Implementing Key Pair JWT authentication for multi-tenant security
- Using Snowflake Model RBAC for governance without application-level authorization code
- Building a streaming chat UI with Streamlit that parses SSE events
Related Resources
This content is provided as is, and is not maintained on an ongoing basis. It may be out of date with current Snowflake instances