Summit Builder Keynote Debut

Broadcast live on June 23

Snowflake for Developers/Guides/Serving Low-Latency Data with Hybrid Tables
Quickstart

Serving Low-Latency Data with Hybrid Tables

Adam Timm

Serving Low-Latency Data with Hybrid Tables

Overview

Note on Production Workloads: Production serving workloads should use bound variables (parameterized queries), private key authentication, and connection pooling. See Hybrid Tables Best Practices and Connecting Applications to Hybrid Tables.

Many data teams maintain expensive reverse ETL pipelines that extract data from Snowflake, transform it, and push it to external systems (Redis, DynamoDB, S3 JSON files) for low-latency serving. Hybrid Tables can eliminate this intermediate layer entirely by serving data directly from Snowflake at double-digit millisecond latency.

This quickstart covers two common serving scenarios:

  1. Email/Marketing Personalization — precompute recommendations in a standard table, bulk-load into a Hybrid Table, serve to a downstream application or service at high concurrency
  2. API Backend — use a Hybrid Table as the backing store for a REST API serving entitlements, session state, or configuration data via primary key lookups

Both scenarios use the same core pattern: compute in columnar, serve from row store.

Why Not Just Use a Standard Table?

Standard Snowflake tables are optimized for analytical workloads. They provide excellent throughput for large scans but cannot deliver consistent double-digit millisecond latency for point lookups because:

  • Result cache helps only for identical repeated queries (not per-user lookups)
  • No row-level index means every point lookup scans micro-partitions
  • Latency variability is high (varies widely depending on cache state and partition layout)

Hybrid Tables provide deterministic low latency for point lookups via the primary key index and secondary indexes stored in the row store.

What You Will Learn

  • How to design a Hybrid Table specifically for serving (schema, PK, indexes)
  • The CTAS+SWAP pattern for atomic bulk refresh without downtime
  • Why reads spike after a CTAS and how to mitigate the warm-up window
  • How to size warehouses for high-concurrency serving workloads
  • Two complete worked scenarios with DDL, data generation, and query patterns

Prerequisites

  • A Snowflake paid account in an AWS or Azure commercial region
  • Familiarity with Hybrid Tables and secondary indexes
  • Understanding of bound variables and plan cache (see Write Optimization guide)

Setup

USE ROLE ACCOUNTADMIN;

CREATE OR REPLACE ROLE HT_SERVE_QS_ROLE;
GRANT ROLE HT_SERVE_QS_ROLE TO ROLE ACCOUNTADMIN;

CREATE OR REPLACE WAREHOUSE HT_SERVE_QS_WH
  WAREHOUSE_SIZE = XSMALL
  AUTO_SUSPEND = 300
  AUTO_RESUME = TRUE
  MAX_CLUSTER_COUNT = 3
  SCALING_POLICY = 'STANDARD';
GRANT OWNERSHIP ON WAREHOUSE HT_SERVE_QS_WH TO ROLE HT_SERVE_QS_ROLE;

CREATE OR REPLACE DATABASE HT_SERVE_QS_DB;
GRANT OWNERSHIP ON DATABASE HT_SERVE_QS_DB TO ROLE HT_SERVE_QS_ROLE;

USE ROLE HT_SERVE_QS_ROLE;
CREATE OR REPLACE SCHEMA HT_SERVE_QS_DB.DATA;
USE WAREHOUSE HT_SERVE_QS_WH;
USE DATABASE HT_SERVE_QS_DB;
USE SCHEMA DATA;

Note: The warehouse uses MAX_CLUSTER_COUNT = 3 and SCALING_POLICY = 'STANDARD'. For serving workloads with high concurrency, multi-cluster warehouses scale horizontally to handle burst traffic without queuing.

Step 1: The Reverse ETL Problem

A typical reverse ETL pipeline for serving personalized data looks like this:

[Snowflake Standard Table]
    → ETL/Transform (Python, dbt, Airflow)
    → Export (COPY INTO S3 / API push)
    → External Cache (Redis, DynamoDB, S3 JSON)
    → Application reads from cache

This architecture introduces:

  • Staleness — data is only as fresh as the last export cycle (often minutes to hours)
  • Operational complexity — multiple systems to monitor, debug, and maintain
  • Cost — compute for export, storage for cache, egress fees, cache infrastructure
  • Brittleness — any failure in the chain breaks the serving path

The Hybrid Table serving pattern collapses this to:

[Snowflake Standard Table]
    → Task (CTAS+SWAP into Hybrid Table)
    → Application reads directly from Snowflake

One system. One copy. Deterministic freshness. No external cache.

Scenario 1: Email/Marketing Personalization

The Use Case

A personalization engine scores users against content several times per day, producing millions of ranked recommendations. A downstream application or service needs to look up the top N recommendations for each user on demand, at high concurrency during peak traffic.

Design the Serving Table

The serving table is designed for one access pattern: lookup by user_id, return their ranked recommendations.

CREATE OR REPLACE HYBRID TABLE user_recommendations (
    user_id         NUMBER       NOT NULL,
    content_id      NUMBER       NOT NULL,
    rank            NUMBER       NOT NULL,
    score           FLOAT        NOT NULL,
    content_title   VARCHAR(500),
    content_url     VARCHAR(2000),
    computed_at     TIMESTAMP_NTZ NOT NULL,
    PRIMARY KEY (user_id, content_id)
);

The composite primary key (user_id, content_id) enables:

  • Fast seek to all recommendations for a specific user
  • Deduplication (same user+content pair cannot appear twice)
  • Ordered retrieval within a user (PK is sorted by user_id first)

Create the Source Table (Simulates Scoring Pipeline Output)

CREATE OR REPLACE TABLE recommendations_source AS
SELECT
    user_id,
    content_id,
    rank,
    score,
    content_title,
    content_url,
    CURRENT_TIMESTAMP()::TIMESTAMP_NTZ AS computed_at
FROM (
    SELECT
        u.user_id,
        c.content_id,
        ROW_NUMBER() OVER (PARTITION BY u.user_id ORDER BY RANDOM()) AS rank,
        ROUND(UNIFORM(0.01, 1.00, RANDOM()), 4) AS score,
        'Article ' || c.content_id::VARCHAR AS content_title,
        'https://example.com/content/' || c.content_id::VARCHAR AS content_url
    FROM
        (SELECT SEQ4() + 1 AS user_id FROM TABLE(GENERATOR(ROWCOUNT => 10000))) u
    CROSS JOIN
        (SELECT SEQ4() + 1 AS content_id FROM TABLE(GENERATOR(ROWCOUNT => 20))) c
)
WHERE rank <= 10;

SELECT COUNT(*) FROM recommendations_source;
-- Expected: 100000 (10,000 users x 10 recommendations each)

Initial Load via CTAS

CREATE OR REPLACE HYBRID TABLE user_recommendations (
    user_id         NUMBER       NOT NULL,
    content_id      NUMBER       NOT NULL,
    rank            NUMBER       NOT NULL,
    score           FLOAT        NOT NULL,
    content_title   VARCHAR(500),
    content_url     VARCHAR(2000),
    computed_at     TIMESTAMP_NTZ NOT NULL,
    PRIMARY KEY (user_id, content_id)
)
AS SELECT * FROM recommendations_source;

Serve: Lookup Recommendations for a User

SET TARGET_USER = (SELECT user_id FROM user_recommendations LIMIT 1);

SELECT content_id, rank, score, content_title, content_url
FROM user_recommendations
WHERE user_id = $TARGET_USER
ORDER BY rank;

Query Profile: TableScan, ROW_BASED, ~10 rows scanned (the user's recommendations). This is the query your email platform executes 20,000+ times per second during a send.

Refresh: The CTAS+SWAP Pattern

When the scoring pipeline produces new recommendations, atomically replace the serving table:

-- Step 1: Build the new version (CTAS into a temporary name)
CREATE OR REPLACE HYBRID TABLE user_recommendations_new (
    user_id         NUMBER       NOT NULL,
    content_id      NUMBER       NOT NULL,
    rank            NUMBER       NOT NULL,
    score           FLOAT        NOT NULL,
    content_title   VARCHAR(500),
    content_url     VARCHAR(2000),
    computed_at     TIMESTAMP_NTZ NOT NULL,
    PRIMARY KEY (user_id, content_id)
)
AS SELECT * FROM recommendations_source;

-- Step 2: Atomic swap (applications see no interruption)
ALTER TABLE user_recommendations SWAP WITH user_recommendations_new;

-- Step 3: Drop the old version (now in the _new name)
DROP TABLE user_recommendations_new;

Why SWAP instead of RENAME? ALTER TABLE ... SWAP WITH exchanges the contents of two tables atomically in a single metadata operation. Applications querying user_recommendations see the new data immediately after the swap with zero downtime. RENAME requires two statements (rename old, rename new) which creates a brief window where the table name does not exist.

Refresh Task (Scheduled)

CREATE OR REPLACE TASK refresh_user_recommendations
  WAREHOUSE = HT_SERVE_QS_WH
  SCHEDULE = 'USING CRON 0 */4 * * * UTC'
AS
BEGIN
  CREATE OR REPLACE HYBRID TABLE user_recommendations_new (
    user_id NUMBER NOT NULL, content_id NUMBER NOT NULL,
    rank NUMBER NOT NULL, score FLOAT NOT NULL,
    content_title VARCHAR(500), content_url VARCHAR(2000),
    computed_at TIMESTAMP_NTZ NOT NULL,
    PRIMARY KEY (user_id, content_id)
  )
  AS SELECT * FROM recommendations_source;

  ALTER TABLE user_recommendations SWAP WITH user_recommendations_new;
  DROP TABLE user_recommendations_new;
END;

Scenario 2: API Backend (Entitlements/Session State)

The Use Case

A REST API serves authorization decisions: given a user_id, return their entitlements (which portfolios, products, or resources they can access). The API needs double-digit millisecond responses at thousands of concurrent requests from multiple microservices.

Design the Serving Table

CREATE OR REPLACE HYBRID TABLE user_entitlements (
    user_id         VARCHAR(100)  NOT NULL,
    resource_type   VARCHAR(50)   NOT NULL,
    resource_id     VARCHAR(200)  NOT NULL,
    access_level    VARCHAR(20)   NOT NULL,
    granted_at      TIMESTAMP_NTZ NOT NULL,
    expires_at      TIMESTAMP_NTZ,
    PRIMARY KEY (user_id, resource_type, resource_id)
);

The composite PK enables:

  • Lookup all entitlements for a user: WHERE user_id = ?
  • Lookup specific resource access: WHERE user_id = ? AND resource_type = ? AND resource_id = ?
  • Both use the PK prefix seek (no secondary index needed)

Load Sample Data

INSERT INTO user_entitlements
SELECT
    'user_' || u.id::VARCHAR AS user_id,
    ARRAY_CONSTRUCT('PORTFOLIO','REPORT','DASHBOARD','API_ENDPOINT')
        [UNIFORM(0,3,RANDOM())]::VARCHAR AS resource_type,
    'resource_' || UNIFORM(1, 500, RANDOM())::VARCHAR AS resource_id,
    ARRAY_CONSTRUCT('READ','WRITE','ADMIN')[UNIFORM(0,2,RANDOM())]::VARCHAR AS access_level,
    DATEADD(DAY, -UNIFORM(1,365,RANDOM()), CURRENT_TIMESTAMP())::TIMESTAMP_NTZ AS granted_at,
    DATEADD(DAY, UNIFORM(30,365,RANDOM()), CURRENT_TIMESTAMP())::TIMESTAMP_NTZ AS expires_at
FROM (SELECT SEQ4() + 1 AS id FROM TABLE(GENERATOR(ROWCOUNT => 50000))) u;

API Query Patterns

Check if user has access to a specific resource:

SET API_USER = 'user_42';
SET API_RESOURCE_TYPE = 'PORTFOLIO';
SET API_RESOURCE_ID = 'resource_100';

SELECT access_level
FROM user_entitlements
WHERE user_id = $API_USER
  AND resource_type = $API_RESOURCE_TYPE
  AND resource_id = $API_RESOURCE_ID
  AND (expires_at IS NULL OR expires_at > CURRENT_TIMESTAMP()::TIMESTAMP_NTZ);

Query Profile: TableScan, ROW_BASED, 1 row scanned. Low double-digit millisecond execution time.

Get all active entitlements for a user:

SELECT resource_type, resource_id, access_level, expires_at
FROM user_entitlements
WHERE user_id = $API_USER
  AND (expires_at IS NULL OR expires_at > CURRENT_TIMESTAMP()::TIMESTAMP_NTZ)
ORDER BY resource_type;

Query Profile: TableScan, ROW_BASED, ~10-50 rows scanned (that user's entitlements).

Incremental Updates (No CTAS+SWAP Needed)

Unlike the email scenario (bulk refresh), entitlement changes are incremental. Use direct DML:

-- Grant new access
INSERT INTO user_entitlements VALUES (
    'user_42', 'DASHBOARD', 'resource_999', 'READ',
    CURRENT_TIMESTAMP()::TIMESTAMP_NTZ,
    DATEADD(DAY, 90, CURRENT_TIMESTAMP())::TIMESTAMP_NTZ
);

-- Revoke access (delete)
DELETE FROM user_entitlements
WHERE user_id = 'user_42' AND resource_type = 'DASHBOARD' AND resource_id = 'resource_999';

-- Bulk sync from source system (MERGE)
MERGE INTO user_entitlements AS tgt
USING entitlements_source AS src
ON tgt.user_id = src.user_id
  AND tgt.resource_type = src.resource_type
  AND tgt.resource_id = src.resource_id
WHEN MATCHED THEN UPDATE SET
  tgt.access_level = src.access_level,
  tgt.expires_at = src.expires_at
WHEN NOT MATCHED THEN INSERT VALUES (
  src.user_id, src.resource_type, src.resource_id,
  src.access_level, src.granted_at, src.expires_at
);

Step 2: Managing the Compaction Window

After a CTAS+SWAP, the new table must warm up — plan cache, warehouse data cache, and row store layout all need to reach steady state. During this window (typically 1-5 minutes for tables under 1M rows), read latency may be elevated.

Why Latency Spikes After a Refresh

After a CTAS+SWAP, the new table has no cached query plans or warm data in the warehouse. The first queries against the new table incur plan compilation and data loading overhead. This warm-up period is transparent to the application but can temporarily increase read latency until steady state is reached.

Mitigation Strategies

1. Schedule refreshes during low-traffic windows:

-- Refresh at 2 AM when traffic is low
CREATE OR REPLACE TASK refresh_user_recommendations
  WAREHOUSE = HT_SERVE_QS_WH
  SCHEDULE = 'USING CRON 0 2 * * * UTC'
AS
BEGIN
  CREATE OR REPLACE HYBRID TABLE user_recommendations_new (...)
  AS SELECT * FROM recommendations_source;
  ALTER TABLE user_recommendations SWAP WITH user_recommendations_new;
  DROP TABLE user_recommendations_new;
END;

2. Pre-warm after swap:

Run a representative query immediately after the swap to trigger warehouse cache loading:

-- Add to the refresh task after the SWAP
SELECT COUNT(*) FROM user_recommendations WHERE user_id = 1;

3. Size the serving warehouse to absorb the warm-up spike:

A multi-cluster warehouse absorbs the temporary latency increase by routing new requests to idle clusters while the new table reaches steady state.

Step 3: Warehouse Sizing for High-Concurrency Serving

Hybrid Table serving workloads have different requirements than analytical workloads:

  • Concurrency over throughput — thousands of small queries, not a few large ones
  • Latency over cost — the goal is consistent double-digit millisecond latency, not maximum rows-per-credit
  • Horizontal scaling — more clusters, not bigger clusters

Recommended Configuration

-- For serving workloads: small size, multiple clusters, standard scaling
ALTER WAREHOUSE HT_SERVE_QS_WH SET
  WAREHOUSE_SIZE = 'XSMALL'
  MAX_CLUSTER_COUNT = 5
  MIN_CLUSTER_COUNT = 1
  SCALING_POLICY = 'STANDARD';
ParameterRecommendationWhy
WAREHOUSE_SIZEXSMALL or SMALLPoint lookups need minimal compute; start with XS and benchmark
MAX_CLUSTER_COUNT3-10 (based on peak concurrency)Horizontal scaling for burst traffic
MIN_CLUSTER_COUNT1-2Keep 1-2 clusters warm to avoid cold-start latency
SCALING_POLICYSTANDARDScales up aggressively; scales down conservatively (fewer cold restarts)
AUTO_SUSPEND300-600Long enough to avoid suspend/resume during traffic dips

Why XSMALL?

Each HT point lookup consumes minimal compute. An XS cluster is typically sufficient for many operational serving workloads. Benchmark your specific query patterns to determine throughput per cluster, then scale horizontally by adding more clusters (not larger size). Larger warehouse sizes add compute power you do not need for point lookups and increase cost without reducing latency.

When to Go Bigger

Increase warehouse size only if your serving queries involve:

  • Joining the HT with another table (e.g., enrichment at query time)
  • Scanning more than ~100 rows per query
  • Running post-read transformations (JSON parsing, UDFs)

Serving Design Checklist

Before deploying a Hybrid Table as a serving layer, validate these requirements:

CheckEmail/MarketingAPI Backend
Access patternLookup by user_id (PK prefix)Lookup by composite key
Refresh frequencyPeriodic (4-24 hours)Incremental (real-time DML)
Refresh methodCTAS+SWAP (atomic bulk replace)Direct INSERT/UPDATE/DELETE/MERGE
Concurrency target10K-50K QPS (campaign blasts)1K-10K QPS (steady API traffic)
WarehouseXS, MAX_CLUSTER_COUNT=5-10XS, MAX_CLUSTER_COUNT=3-5
DriverREST API or SDK with poolingJDBC/Python with HikariCP/SQLAlchemy pool
Bound variablesRequired (plan cache)Required (plan cache)
Warm-up concernYes (schedule refresh in low-traffic window)No (incremental writes)

Cleanup

ALTER TASK IF EXISTS refresh_user_recommendations SUSPEND;
USE ROLE ACCOUNTADMIN;
DROP DATABASE IF EXISTS HT_SERVE_QS_DB;
DROP WAREHOUSE IF EXISTS HT_SERVE_QS_WH;
DROP ROLE IF EXISTS HT_SERVE_QS_ROLE;

Conclusion and Resources

You can now:

  • Replace reverse ETL pipelines with Hybrid Tables as a serving layer
  • Design serving tables with composite primary keys for efficient point lookups
  • Use CTAS+SWAP for atomic bulk refresh with zero application downtime
  • Manage warm-up windows by scheduling refreshes during low-traffic periods
  • Size multi-cluster warehouses for high-concurrency serving (XS + horizontal scaling)

When to Use This Pattern vs External Cache

Use Hybrid Table ServingKeep External Cache (Redis/DynamoDB)
Latency target: double-digit msLatency target: <1ms
Data already in SnowflakeData originates outside Snowflake
Want to eliminate pipeline complexityCache invalidation is simple for your use case
Refresh frequency: minutes to hoursRefresh frequency: sub-second
Concurrency: up to ~50K QPSConcurrency: >100K QPS sustained
Data consistency matters (single source of truth)Eventual consistency acceptable

Related Resources

FAQ

Q: How does this compare to Snowflake's result cache?

Hybrid Tables do not use the result cache. Every query against a Hybrid Table executes fresh — there is no cached result to return regardless of query text. This is by design: HT data changes frequently via DML, so cached results would be stale.

Q: What if my serving table has 100 million rows?

Hybrid Tables handle this well for point lookups. The PK index scales logarithmically — a PK lookup on 100M rows is nearly the same latency as on 1M rows. CTAS+SWAP bulk refresh will take longer (10-30 minutes on 100M rows), so schedule appropriately and consider the compaction window.

Q: Can I serve from multiple Hybrid Tables in a single API call?

Yes — JOIN two Hybrid Tables in a single query. If both sides use PK or FK joins, the query plan uses nested loop joins with row-based access on both sides. For best results, ensure the JOIN key matches a PK or has a foreign key relationship defined.

Q: What about connection limits?

Each Snowflake warehouse cluster handles connections independently. With MAX_CLUSTER_COUNT=5 on an XS, you have 5 independent clusters handling requests in parallel. Connection pooling at the application layer (HikariCP, SQLAlchemy) prevents exhausting connections per cluster.

Q: Can I use ALTER TABLE SWAP WITH for tables with different schemas?

No. Both tables must have identical column definitions (names, types, constraints). The SWAP operation exchanges only the data; the schema must already match.

Q: What happens to in-flight queries during a SWAP?

SWAP is a metadata operation. Queries that started before the SWAP read the old data. Queries that start after the SWAP read the new data. There is no interruption or error for either set.

Updated 2026-06-17

This content is provided as is, and is not maintained on an ongoing basis. It may be out of date with current Snowflake instances