Serving Low-Latency Data with Hybrid Tables
Serving Low-Latency Data with Hybrid Tables
Overview
Note on Production Workloads: Production serving workloads should use bound variables (parameterized queries), private key authentication, and connection pooling. See Hybrid Tables Best Practices and Connecting Applications to Hybrid Tables.
Many data teams maintain expensive reverse ETL pipelines that extract data from Snowflake, transform it, and push it to external systems (Redis, DynamoDB, S3 JSON files) for low-latency serving. Hybrid Tables can eliminate this intermediate layer entirely by serving data directly from Snowflake at double-digit millisecond latency.
This quickstart covers two common serving scenarios:
- Email/Marketing Personalization — precompute recommendations in a standard table, bulk-load into a Hybrid Table, serve to a downstream application or service at high concurrency
- API Backend — use a Hybrid Table as the backing store for a REST API serving entitlements, session state, or configuration data via primary key lookups
Both scenarios use the same core pattern: compute in columnar, serve from row store.
Why Not Just Use a Standard Table?
Standard Snowflake tables are optimized for analytical workloads. They provide excellent throughput for large scans but cannot deliver consistent double-digit millisecond latency for point lookups because:
- Result cache helps only for identical repeated queries (not per-user lookups)
- No row-level index means every point lookup scans micro-partitions
- Latency variability is high (varies widely depending on cache state and partition layout)
Hybrid Tables provide deterministic low latency for point lookups via the primary key index and secondary indexes stored in the row store.
What You Will Learn
- How to design a Hybrid Table specifically for serving (schema, PK, indexes)
- The CTAS+SWAP pattern for atomic bulk refresh without downtime
- Why reads spike after a CTAS and how to mitigate the warm-up window
- How to size warehouses for high-concurrency serving workloads
- Two complete worked scenarios with DDL, data generation, and query patterns
Prerequisites
- A Snowflake paid account in an AWS or Azure commercial region
- Familiarity with Hybrid Tables and secondary indexes
- Understanding of bound variables and plan cache (see Write Optimization guide)
Setup
USE ROLE ACCOUNTADMIN; CREATE OR REPLACE ROLE HT_SERVE_QS_ROLE; GRANT ROLE HT_SERVE_QS_ROLE TO ROLE ACCOUNTADMIN; CREATE OR REPLACE WAREHOUSE HT_SERVE_QS_WH WAREHOUSE_SIZE = XSMALL AUTO_SUSPEND = 300 AUTO_RESUME = TRUE MAX_CLUSTER_COUNT = 3 SCALING_POLICY = 'STANDARD'; GRANT OWNERSHIP ON WAREHOUSE HT_SERVE_QS_WH TO ROLE HT_SERVE_QS_ROLE; CREATE OR REPLACE DATABASE HT_SERVE_QS_DB; GRANT OWNERSHIP ON DATABASE HT_SERVE_QS_DB TO ROLE HT_SERVE_QS_ROLE; USE ROLE HT_SERVE_QS_ROLE; CREATE OR REPLACE SCHEMA HT_SERVE_QS_DB.DATA; USE WAREHOUSE HT_SERVE_QS_WH; USE DATABASE HT_SERVE_QS_DB; USE SCHEMA DATA;
Note: The warehouse uses
MAX_CLUSTER_COUNT = 3andSCALING_POLICY = 'STANDARD'. For serving workloads with high concurrency, multi-cluster warehouses scale horizontally to handle burst traffic without queuing.
Step 1: The Reverse ETL Problem
A typical reverse ETL pipeline for serving personalized data looks like this:
[Snowflake Standard Table] → ETL/Transform (Python, dbt, Airflow) → Export (COPY INTO S3 / API push) → External Cache (Redis, DynamoDB, S3 JSON) → Application reads from cache
This architecture introduces:
- Staleness — data is only as fresh as the last export cycle (often minutes to hours)
- Operational complexity — multiple systems to monitor, debug, and maintain
- Cost — compute for export, storage for cache, egress fees, cache infrastructure
- Brittleness — any failure in the chain breaks the serving path
The Hybrid Table serving pattern collapses this to:
[Snowflake Standard Table] → Task (CTAS+SWAP into Hybrid Table) → Application reads directly from Snowflake
One system. One copy. Deterministic freshness. No external cache.
Scenario 1: Email/Marketing Personalization
The Use Case
A personalization engine scores users against content several times per day, producing millions of ranked recommendations. A downstream application or service needs to look up the top N recommendations for each user on demand, at high concurrency during peak traffic.
Design the Serving Table
The serving table is designed for one access pattern: lookup by user_id, return their ranked recommendations.
CREATE OR REPLACE HYBRID TABLE user_recommendations ( user_id NUMBER NOT NULL, content_id NUMBER NOT NULL, rank NUMBER NOT NULL, score FLOAT NOT NULL, content_title VARCHAR(500), content_url VARCHAR(2000), computed_at TIMESTAMP_NTZ NOT NULL, PRIMARY KEY (user_id, content_id) );
The composite primary key (user_id, content_id) enables:
- Fast seek to all recommendations for a specific user
- Deduplication (same user+content pair cannot appear twice)
- Ordered retrieval within a user (PK is sorted by user_id first)
Create the Source Table (Simulates Scoring Pipeline Output)
CREATE OR REPLACE TABLE recommendations_source AS SELECT user_id, content_id, rank, score, content_title, content_url, CURRENT_TIMESTAMP()::TIMESTAMP_NTZ AS computed_at FROM ( SELECT u.user_id, c.content_id, ROW_NUMBER() OVER (PARTITION BY u.user_id ORDER BY RANDOM()) AS rank, ROUND(UNIFORM(0.01, 1.00, RANDOM()), 4) AS score, 'Article ' || c.content_id::VARCHAR AS content_title, 'https://example.com/content/' || c.content_id::VARCHAR AS content_url FROM (SELECT SEQ4() + 1 AS user_id FROM TABLE(GENERATOR(ROWCOUNT => 10000))) u CROSS JOIN (SELECT SEQ4() + 1 AS content_id FROM TABLE(GENERATOR(ROWCOUNT => 20))) c ) WHERE rank <= 10; SELECT COUNT(*) FROM recommendations_source; -- Expected: 100000 (10,000 users x 10 recommendations each)
Initial Load via CTAS
CREATE OR REPLACE HYBRID TABLE user_recommendations ( user_id NUMBER NOT NULL, content_id NUMBER NOT NULL, rank NUMBER NOT NULL, score FLOAT NOT NULL, content_title VARCHAR(500), content_url VARCHAR(2000), computed_at TIMESTAMP_NTZ NOT NULL, PRIMARY KEY (user_id, content_id) ) AS SELECT * FROM recommendations_source;
Serve: Lookup Recommendations for a User
SET TARGET_USER = (SELECT user_id FROM user_recommendations LIMIT 1); SELECT content_id, rank, score, content_title, content_url FROM user_recommendations WHERE user_id = $TARGET_USER ORDER BY rank;
Query Profile: TableScan, ROW_BASED, ~10 rows scanned (the user's recommendations). This is the query your email platform executes 20,000+ times per second during a send.
Refresh: The CTAS+SWAP Pattern
When the scoring pipeline produces new recommendations, atomically replace the serving table:
-- Step 1: Build the new version (CTAS into a temporary name) CREATE OR REPLACE HYBRID TABLE user_recommendations_new ( user_id NUMBER NOT NULL, content_id NUMBER NOT NULL, rank NUMBER NOT NULL, score FLOAT NOT NULL, content_title VARCHAR(500), content_url VARCHAR(2000), computed_at TIMESTAMP_NTZ NOT NULL, PRIMARY KEY (user_id, content_id) ) AS SELECT * FROM recommendations_source; -- Step 2: Atomic swap (applications see no interruption) ALTER TABLE user_recommendations SWAP WITH user_recommendations_new; -- Step 3: Drop the old version (now in the _new name) DROP TABLE user_recommendations_new;
Why SWAP instead of RENAME?
ALTER TABLE ... SWAP WITHexchanges the contents of two tables atomically in a single metadata operation. Applications queryinguser_recommendationssee the new data immediately after the swap with zero downtime.RENAMErequires two statements (rename old, rename new) which creates a brief window where the table name does not exist.
Refresh Task (Scheduled)
CREATE OR REPLACE TASK refresh_user_recommendations WAREHOUSE = HT_SERVE_QS_WH SCHEDULE = 'USING CRON 0 */4 * * * UTC' AS BEGIN CREATE OR REPLACE HYBRID TABLE user_recommendations_new ( user_id NUMBER NOT NULL, content_id NUMBER NOT NULL, rank NUMBER NOT NULL, score FLOAT NOT NULL, content_title VARCHAR(500), content_url VARCHAR(2000), computed_at TIMESTAMP_NTZ NOT NULL, PRIMARY KEY (user_id, content_id) ) AS SELECT * FROM recommendations_source; ALTER TABLE user_recommendations SWAP WITH user_recommendations_new; DROP TABLE user_recommendations_new; END;
Scenario 2: API Backend (Entitlements/Session State)
The Use Case
A REST API serves authorization decisions: given a user_id, return their entitlements (which portfolios, products, or resources they can access). The API needs double-digit millisecond responses at thousands of concurrent requests from multiple microservices.
Design the Serving Table
CREATE OR REPLACE HYBRID TABLE user_entitlements ( user_id VARCHAR(100) NOT NULL, resource_type VARCHAR(50) NOT NULL, resource_id VARCHAR(200) NOT NULL, access_level VARCHAR(20) NOT NULL, granted_at TIMESTAMP_NTZ NOT NULL, expires_at TIMESTAMP_NTZ, PRIMARY KEY (user_id, resource_type, resource_id) );
The composite PK enables:
- Lookup all entitlements for a user:
WHERE user_id = ? - Lookup specific resource access:
WHERE user_id = ? AND resource_type = ? AND resource_id = ? - Both use the PK prefix seek (no secondary index needed)
Load Sample Data
INSERT INTO user_entitlements SELECT 'user_' || u.id::VARCHAR AS user_id, ARRAY_CONSTRUCT('PORTFOLIO','REPORT','DASHBOARD','API_ENDPOINT') [UNIFORM(0,3,RANDOM())]::VARCHAR AS resource_type, 'resource_' || UNIFORM(1, 500, RANDOM())::VARCHAR AS resource_id, ARRAY_CONSTRUCT('READ','WRITE','ADMIN')[UNIFORM(0,2,RANDOM())]::VARCHAR AS access_level, DATEADD(DAY, -UNIFORM(1,365,RANDOM()), CURRENT_TIMESTAMP())::TIMESTAMP_NTZ AS granted_at, DATEADD(DAY, UNIFORM(30,365,RANDOM()), CURRENT_TIMESTAMP())::TIMESTAMP_NTZ AS expires_at FROM (SELECT SEQ4() + 1 AS id FROM TABLE(GENERATOR(ROWCOUNT => 50000))) u;
API Query Patterns
Check if user has access to a specific resource:
SET API_USER = 'user_42'; SET API_RESOURCE_TYPE = 'PORTFOLIO'; SET API_RESOURCE_ID = 'resource_100'; SELECT access_level FROM user_entitlements WHERE user_id = $API_USER AND resource_type = $API_RESOURCE_TYPE AND resource_id = $API_RESOURCE_ID AND (expires_at IS NULL OR expires_at > CURRENT_TIMESTAMP()::TIMESTAMP_NTZ);
Query Profile: TableScan, ROW_BASED, 1 row scanned. Low double-digit millisecond execution time.
Get all active entitlements for a user:
SELECT resource_type, resource_id, access_level, expires_at FROM user_entitlements WHERE user_id = $API_USER AND (expires_at IS NULL OR expires_at > CURRENT_TIMESTAMP()::TIMESTAMP_NTZ) ORDER BY resource_type;
Query Profile: TableScan, ROW_BASED, ~10-50 rows scanned (that user's entitlements).
Incremental Updates (No CTAS+SWAP Needed)
Unlike the email scenario (bulk refresh), entitlement changes are incremental. Use direct DML:
-- Grant new access INSERT INTO user_entitlements VALUES ( 'user_42', 'DASHBOARD', 'resource_999', 'READ', CURRENT_TIMESTAMP()::TIMESTAMP_NTZ, DATEADD(DAY, 90, CURRENT_TIMESTAMP())::TIMESTAMP_NTZ ); -- Revoke access (delete) DELETE FROM user_entitlements WHERE user_id = 'user_42' AND resource_type = 'DASHBOARD' AND resource_id = 'resource_999'; -- Bulk sync from source system (MERGE) MERGE INTO user_entitlements AS tgt USING entitlements_source AS src ON tgt.user_id = src.user_id AND tgt.resource_type = src.resource_type AND tgt.resource_id = src.resource_id WHEN MATCHED THEN UPDATE SET tgt.access_level = src.access_level, tgt.expires_at = src.expires_at WHEN NOT MATCHED THEN INSERT VALUES ( src.user_id, src.resource_type, src.resource_id, src.access_level, src.granted_at, src.expires_at );
Step 2: Managing the Compaction Window
After a CTAS+SWAP, the new table must warm up — plan cache, warehouse data cache, and row store layout all need to reach steady state. During this window (typically 1-5 minutes for tables under 1M rows), read latency may be elevated.
Why Latency Spikes After a Refresh
After a CTAS+SWAP, the new table has no cached query plans or warm data in the warehouse. The first queries against the new table incur plan compilation and data loading overhead. This warm-up period is transparent to the application but can temporarily increase read latency until steady state is reached.
Mitigation Strategies
1. Schedule refreshes during low-traffic windows:
-- Refresh at 2 AM when traffic is low CREATE OR REPLACE TASK refresh_user_recommendations WAREHOUSE = HT_SERVE_QS_WH SCHEDULE = 'USING CRON 0 2 * * * UTC' AS BEGIN CREATE OR REPLACE HYBRID TABLE user_recommendations_new (...) AS SELECT * FROM recommendations_source; ALTER TABLE user_recommendations SWAP WITH user_recommendations_new; DROP TABLE user_recommendations_new; END;
2. Pre-warm after swap:
Run a representative query immediately after the swap to trigger warehouse cache loading:
-- Add to the refresh task after the SWAP SELECT COUNT(*) FROM user_recommendations WHERE user_id = 1;
3. Size the serving warehouse to absorb the warm-up spike:
A multi-cluster warehouse absorbs the temporary latency increase by routing new requests to idle clusters while the new table reaches steady state.
Step 3: Warehouse Sizing for High-Concurrency Serving
Hybrid Table serving workloads have different requirements than analytical workloads:
- Concurrency over throughput — thousands of small queries, not a few large ones
- Latency over cost — the goal is consistent double-digit millisecond latency, not maximum rows-per-credit
- Horizontal scaling — more clusters, not bigger clusters
Recommended Configuration
-- For serving workloads: small size, multiple clusters, standard scaling ALTER WAREHOUSE HT_SERVE_QS_WH SET WAREHOUSE_SIZE = 'XSMALL' MAX_CLUSTER_COUNT = 5 MIN_CLUSTER_COUNT = 1 SCALING_POLICY = 'STANDARD';
| Parameter | Recommendation | Why |
|---|---|---|
| WAREHOUSE_SIZE | XSMALL or SMALL | Point lookups need minimal compute; start with XS and benchmark |
| MAX_CLUSTER_COUNT | 3-10 (based on peak concurrency) | Horizontal scaling for burst traffic |
| MIN_CLUSTER_COUNT | 1-2 | Keep 1-2 clusters warm to avoid cold-start latency |
| SCALING_POLICY | STANDARD | Scales up aggressively; scales down conservatively (fewer cold restarts) |
| AUTO_SUSPEND | 300-600 | Long enough to avoid suspend/resume during traffic dips |
Why XSMALL?
Each HT point lookup consumes minimal compute. An XS cluster is typically sufficient for many operational serving workloads. Benchmark your specific query patterns to determine throughput per cluster, then scale horizontally by adding more clusters (not larger size). Larger warehouse sizes add compute power you do not need for point lookups and increase cost without reducing latency.
When to Go Bigger
Increase warehouse size only if your serving queries involve:
- Joining the HT with another table (e.g., enrichment at query time)
- Scanning more than ~100 rows per query
- Running post-read transformations (JSON parsing, UDFs)
Serving Design Checklist
Before deploying a Hybrid Table as a serving layer, validate these requirements:
| Check | Email/Marketing | API Backend |
|---|---|---|
| Access pattern | Lookup by user_id (PK prefix) | Lookup by composite key |
| Refresh frequency | Periodic (4-24 hours) | Incremental (real-time DML) |
| Refresh method | CTAS+SWAP (atomic bulk replace) | Direct INSERT/UPDATE/DELETE/MERGE |
| Concurrency target | 10K-50K QPS (campaign blasts) | 1K-10K QPS (steady API traffic) |
| Warehouse | XS, MAX_CLUSTER_COUNT=5-10 | XS, MAX_CLUSTER_COUNT=3-5 |
| Driver | REST API or SDK with pooling | JDBC/Python with HikariCP/SQLAlchemy pool |
| Bound variables | Required (plan cache) | Required (plan cache) |
| Warm-up concern | Yes (schedule refresh in low-traffic window) | No (incremental writes) |
Cleanup
ALTER TASK IF EXISTS refresh_user_recommendations SUSPEND; USE ROLE ACCOUNTADMIN; DROP DATABASE IF EXISTS HT_SERVE_QS_DB; DROP WAREHOUSE IF EXISTS HT_SERVE_QS_WH; DROP ROLE IF EXISTS HT_SERVE_QS_ROLE;
Conclusion and Resources
You can now:
- Replace reverse ETL pipelines with Hybrid Tables as a serving layer
- Design serving tables with composite primary keys for efficient point lookups
- Use CTAS+SWAP for atomic bulk refresh with zero application downtime
- Manage warm-up windows by scheduling refreshes during low-traffic periods
- Size multi-cluster warehouses for high-concurrency serving (XS + horizontal scaling)
When to Use This Pattern vs External Cache
| Use Hybrid Table Serving | Keep External Cache (Redis/DynamoDB) |
|---|---|
| Latency target: double-digit ms | Latency target: <1ms |
| Data already in Snowflake | Data originates outside Snowflake |
| Want to eliminate pipeline complexity | Cache invalidation is simple for your use case |
| Refresh frequency: minutes to hours | Refresh frequency: sub-second |
| Concurrency: up to ~50K QPS | Concurrency: >100K QPS sustained |
| Data consistency matters (single source of truth) | Eventual consistency acceptable |
Related Resources
FAQ
Q: How does this compare to Snowflake's result cache?
Hybrid Tables do not use the result cache. Every query against a Hybrid Table executes fresh — there is no cached result to return regardless of query text. This is by design: HT data changes frequently via DML, so cached results would be stale.
Q: What if my serving table has 100 million rows?
Hybrid Tables handle this well for point lookups. The PK index scales logarithmically — a PK lookup on 100M rows is nearly the same latency as on 1M rows. CTAS+SWAP bulk refresh will take longer (10-30 minutes on 100M rows), so schedule appropriately and consider the compaction window.
Q: Can I serve from multiple Hybrid Tables in a single API call?
Yes — JOIN two Hybrid Tables in a single query. If both sides use PK or FK joins, the query plan uses nested loop joins with row-based access on both sides. For best results, ensure the JOIN key matches a PK or has a foreign key relationship defined.
Q: What about connection limits?
Each Snowflake warehouse cluster handles connections independently. With MAX_CLUSTER_COUNT=5 on an XS, you have 5 independent clusters handling requests in parallel. Connection pooling at the application layer (HikariCP, SQLAlchemy) prevents exhausting connections per cluster.
Q: Can I use ALTER TABLE SWAP WITH for tables with different schemas?
No. Both tables must have identical column definitions (names, types, constraints). The SWAP operation exchanges only the data; the schema must already match.
Q: What happens to in-flight queries during a SWAP?
SWAP is a metadata operation. Queries that started before the SWAP read the old data. Queries that start after the SWAP read the new data. There is no interruption or error for either set.
This content is provided as is, and is not maintained on an ongoing basis. It may be out of date with current Snowflake instances