Core Platform

MAY 11, 2026|8 min read

CHECK Constraints, Error Tables and Cortex Code: Stop Cleaning Up Bad Data

Bad data is expensive. Not because it's hard to find, but because you find it too late.

An order lands with status = 'actve' instead of 'active'. A JSON event arrives with an amount: "banana". A NULL slips into a nonnullable dimension key. By the time your validation catches it, the bad rows are already in your tables, your dashboards are wrong, and someone's asking why revenue dropped 40% overnight.

What if Snowflake helped you ensure that only good quality data is ingested into your tables?

By popular demand — enforced CHECK constraints are generally available

With the introduction of enforced CHECK constraints, Snowflake now offers a native, powerful way to maintain data quality. The two main values provided by this feature are:

Help with migration: It offers a drop-in replacement for existing check-constraint data quality enforcement found in systems like Oracle and SQL Server, which is particularly valuable for migration teams.
Simplify transformation logic: It allows teams to retire validation logic scattered across dbt tests, stored procedures and application-layer checks, consolidating that complexity directly into the table definition.

The problem

Without enforced constraints, data teams rely on postload validation, using tools like dbt tests, Monte Carlo or custom SQL checks that run after data has landed. The pattern is always the same:

A pipeline loads data into orders, including bad rows
A validation step runs after the load; finds 47 rows with status = 'actve'
An alert fires. An engineer investigates.
They write a manual fix: UPDATE orders SET status = 'active' WHERE status = 'actve'
They trace the source, fix the upstream pipeline, redeploy
Total time: hours to days. Bad data was queryable the entire time.

Whether it's a dbt accepted_values test, a Great Expectations suite, or a hand-rolled SQL check, the fundamental problem is the same: postload validation catches problems after the fact.

CHECK constraints are now GA in Snowflake, and they're enforced. Same syntax you know from Oracle, SQL Server and PostgreSQL. If you're migrating, your CHECK constraints come with you, and the painful workarounds can be retired.

Get started in 60 seconds

A minute is more than enough to understand the power and the beauty of CHECK Constraints + Error Tables. Here's a simple example:

-- 1. Create a table with CHECK constraints and error logging
CREATE TABLE my_table (
    id      INT NOT NULL,
    status  VARCHAR(20) CHECK (status IN ('active', 'inactive')),
    amount  NUMBER(10,2) CHECK (amount > 0)
) ERROR_LOGGING = TRUE;

-- 2. Insert data - good rows land, bad rows captured
INSERT INTO my_table VALUES
    (1, 'active', 99.99),    -- succeeds
    (2, 'invalid', 50.00),   -- captured: bad status
    (3, 'active', -10.00);   -- captured: negative amount

-- 3. See what was rejected
SELECT * FROM ERROR_TABLE(my_table);

Or enable on an existing table; instant, metadata-only, no table scan:

ALTER TABLE existing_table
ADD CONSTRAINT chk_positive CHECK (amount > 0) ENABLE NOVALIDATE;

ALTER TABLE existing_table SET ERROR_LOGGING = TRUE;

ENABLE NOVALIDATE enforces the constraint on all future writes without scanning existing data. Only the base table owner (or roles granted SELECT ERROR TABLE) can query the error table; access to rejected row data is restricted to authorized roles.

Then ask Cortex Code: "Analyze the error table for my_table"; and let the skill take it from there.

Does this still matter in the AI era?

In the age of generative AI, the value of robust, natively enforced database features has become even clearer. One challenge starting to emerge is that AI agents can easily generate a significant amount of code to provide workarounds for natively missing features. For instance, if a database doesn't enforce a simple CHECK constraint like amount > 0, an AI agent might generate complex preload or postload validation logic in Python, dbt or a stored procedure to handle it. More code, regardless of who or what generated it, inevitably means more maintenance costs, whether that cost is measured in human time spent debugging or in AI tokens consumed for reasoning and maintenance tasks.

Therefore, having native support for features that increase efficiency, like CHECK constraints and Error Tables, helps both humans and AI alike. For the AI, native enforcement allows it to write more consistent, streamlined and understandable SQL code, reducing the need for verbose and brittle workaround solutions. For the human data engineer, it becomes easier to understand what the AI did and to fully own the resulting data pipeline without problems, because the core logic is declared once and enforced by the database, not buried in layers of generated procedural code.

The result is a win-win for both humans and AI: code that is generated by AI, yet is cleaner, more reliable, cheaper to maintain and inherently easier to understand. Our support for CHECK constraints is one of the many steps we are taking to foster more fruitful collaboration between human developers and AI.

Enforce at write time, now in Snowflake

CHECK constraints move validation to the moment data is written. Define the rules on the table; Snowflake enforces them on every INSERT, UPDATE and MERGE.

CREATE OR REPLACE TABLE orders (
    order_id    INT          NOT NULL,
    customer_id INT          NOT NULL,
    status      VARCHAR(20)  CHECK (status IN ('active', 'pending', 'shipped', 'delivered', 'returned')),
    amount      NUMBER(10,2) CHECK (amount > 0),
    order_date  DATE,
    ship_date   DATE,
    event_data  VARIANT,

    -- Cross-column logic
    CONSTRAINT chk_ship_after_order CHECK (ship_date >= order_date),

    -- Required JSON fields
    CONSTRAINT chk_evt_type    CHECK (event_data:type::STRING IS NOT NULL),
    CONSTRAINT chk_evt_source  CHECK (event_data:source::STRING IS NOT NULL),

    -- VARIANT type enforcement
    CONSTRAINT chk_evt_meta_obj CHECK (event_data:metadata IS NULL OR IS_OBJECT(event_data:metadata)),
    CONSTRAINT chk_evt_tags_arr CHECK (event_data:tags IS NULL OR IS_ARRAY(event_data:tags)),

    -- Malformed value rejection
    CONSTRAINT chk_evt_amount  CHECK (event_data:amount IS NULL
                                    OR TRY_CAST(event_data:amount::STRING AS NUMBER) IS NOT NULL)
);

Now try inserting bad data:

-- This succeeds: all constraints satisfied
INSERT INTO orders SELECT
    1, 100, 'pending', 49.99, '2026-01-15'::DATE, NULL,
    PARSE_JSON('{"type":"web","source":"checkout","metadata":{"browser":"chrome"},"amount":49.99}');

-- This fails: status = 'actve' is not in the allowed list
INSERT INTO orders SELECT
    2, 101, 'actve', 29.99, '2026-01-16'::DATE, NULL,
    PARSE_JSON('{"type":"web","source":"cart"}');

-- Error: CHECK constraint ... which requires that status IN
--        ('active','pending', 'shipped', 'delivered', 'returned'), was violated

In this example, the typo does not make it into the table, reducing the need for additional tools and products needed to catch it; the database said no.

What you can enforce

CHECK constraints support any deterministic scalar expression. Some examples beyond the basics:

Use Case	Example
Allowed values	status IN ('pending', 'shipped', 'delivered', 'returned')
Range enforcement	amount > 0
Cross-column logic	ship_date >= order_date
Required JSON fields	event_data:type::STRING IS NOT NULL
Nested VARIANT paths	event_data:address:city::STRING IS NOT NULL
VARIANT type enforcement	IS_OBJECT(event_data:metadata), IS_ARRAY(event_data:tags)
Malformed value rejection	TRY_CAST(event_data:amount::STRING AS NUMBER) IS NOT NULL
Array bounds	ARRAY_SIZE(tags) > 0 AND ARRAY_SIZE(tags) <= 10
Pattern matching	email LIKE '%@%.%'

The VARIANT validation is where Snowflake goes beyond the standard. Oracle, PostgreSQL and SQL Server can enforce CHECK constraints on scalar columns; however, traditionally have more limited support for enforcing structural rules on semi-structured JSON fields. If your data has a VARIANT column, you can now enforce that specific keys exist, have the right types and contain valid values; at the database level.

But what about pipelines that can't stop?

Hard rejection is great for manual workflows. But data pipelines loading millions of rows via INSERT, UPDATE or MERGE can't afford to fail on one bad record. You need the good rows to land while capturing the bad ones for review.

Note: CHECK constraints are currently enforced on INSERT, UPDATE, MERGE and CTAS. COPY INTO and Snowpipe support is on the roadmap.

That's where Error Tables come in. One property on the table:

ALTER TABLE orders SET ERROR_LOGGING = TRUE;

Now bad rows don't kill the statement; they get routed to a dedicated error table:

-- 5 rows: 3 valid, 2 with constraint violations
INSERT INTO orders SELECT
10, 200, 'shipped', 99.99, '2026-02-01', '2026-02-03',
       PARSE_JSON('{"type":"api","source":"mobile","amount":99.99}')
UNION ALL SELECT
11, 201, 'INVALID', 50.00, '2026-02-01', NULL,
       PARSE_JSON('{"type":"web","source":"checkout"}')
UNION ALL SELECT
12, 202, 'pending', -5.00, '2026-02-02', NULL,
       PARSE_JSON('{"type":"api","source":"pos"}')
UNION ALL SELECT
13, 203, 'delivered', 149.99, '2026-02-03', '2026-02-05',
    PARSE_JSON('{"type":"web","source":"checkout","metadata":{"device":"iphone"}}')
UNION ALL SELECT
14, 204, 'returned', 25.00, '2026-02-04', '2026-02-06',
       PARSE_JSON('{"type":"api","source":"returns"}');

-- Result: 3 rows inserted (ids 10, 13, 14)
-- 2 rows captured in error table (ids 11, 12)

Query the error table to see exactly what failed and why:

SELECT
    error_data:ORDER_ID::INT       AS order_id,
    error_data:STATUS::STRING      AS bad_status,
    error_data:AMOUNT::NUMBER      AS bad_amount,
    error_code,
    error_metadata:error_message::STRING AS reason
FROM ERROR_TABLE(orders)
WHERE query_id = last_query_id()
ORDER BY order_id;

ORDER_ID	BAD_STATUS	BAD_AMOUNT	ERROR_CODE	REASON
11	INVALID	50.00	100320	CHECK constraint ... status IN (...) violated
12	pending	-5.00	100320	CHECK constraint ... amount > 0 ... violated

Every rejected row is preserved with:

The full row payload; all column values
The error code
The constraint name and expression; in the error message
Timestamp and query ID; for tracing back to the pipeline run

The good rows landed. The pipeline succeeded. The bad rows are waiting for review.

Diagnose and fix with Cortex Code CLI

Querying error tables manually works, but the error-tables-ops skill in the Cortex Code CLI turns it into a conversation:

"Analyze my error table for orders"

Cortex Code runs the diagnostic queries and presents a breakdown:

Error Breakdown for ORDERS (last 7 days):

ERROR TYPE                   | COLUMN/CONSTRAINT           | COUNT | FIRST SEEN | LAST SEEN
-----------------------------|-----------------------------|-------|------------|----------
CHECK constraint violation   | status IN (...)             |    23 | 2026-02-01 | 2026-02-04
CHECK constraint violation   | amount > 0                  |     8 | 2026-02-02 | 2026-02-04
CHECK constraint violation   | event_data:type IS NOT NULL |     3 | 2026-02-03 | 2026-02-03
NOT NULL violation           | CUSTOMER_ID                 |     1 | 2026-02-04 | 2026-02-04

Recommendations:
  23 status violations - check upstream for typos ('INVALID', 'Active', 'Pending')
  8 negative amounts - likely a sign-convention bug in the POS integration
  3 missing event types - the mobile SDK may be sending incomplete payloads

"Fix the rejected rows"

Cortex Code extracts the error data, asks how you want to handle each error type, and generates corrective SQL:

-- Auto-generated fix for status violations (mapping common typos)
INSERT INTO orders (order_id, customer_id, status, amount, order_date, ship_date, event_data)
SELECT
    error_data:ORDER_ID::INT,
    error_data:CUSTOMER_ID::INT,
    CASE error_data:STATUS::STRING
        WHEN 'INVALID' THEN 'pending'   -- user chose: map to 'pending'
        WHEN 'shippd'  THEN 'shipped'   -- user chose: fix typo
        WHEN 'Pending' THEN 'pending'   -- user chose: lowercase
        ELSE 'pending'                  -- fallback
    END,
    error_data:AMOUNT::NUMBER,
    error_data:ORDER_DATE::DATE,
    error_data:SHIP_DATE::DATE,
    error_data:EVENT_DATA
FROM ERROR_TABLE(orders)
WHERE error_code = 100320
  AND error_metadata:error_message::STRING LIKE '%status IN%';

"Set up monitoring for error spikes"

Cortex Code generates alert DDL; ready to review and deploy:

CREATE OR REPLACE ALERT orders_error_spike
  WAREHOUSE = my_wh
  SCHEDULE = '60 MINUTE'
  IF (EXISTS (
    SELECT 1
    FROM ERROR_TABLE(orders)
    WHERE TIMESTAMP > DATEADD('minute', -60, CURRENT_TIMESTAMP())
    HAVING COUNT(*) > 100
  ))
  THEN
    CALL SYSTEM$SEND_SNOWFLAKE_NOTIFICATION(
      SNOWFLAKE.NOTIFICATION.TEXT_PLAIN(
        '100+ errors detected on ORDERS in the last 60 minutes.'
      ),
      SNOWFLAKE.NOTIFICATION.EMAIL_INTEGRATION_CONFIG(
        'error_table_email', '[email protected]', 'Error Spike - ORDERS'
      )
    );

ALTER ALERT orders_error_spike RESUME;

Here's how these three features work together:

Layer	What It Does	When It Acts
CHECK Constraints	Helps prevent invalid data from entering	At write time (INSERT/UPDATE/MERGE)
Error Tables	Captures rejected rows with full diagnostics	At write time (automatic)
Cortex Code error-tables-ops skill	Analyzes errors, generates fixes, sets up monitoring	On demand
Post-load validation	Verifies business logic, catches what constraints can't express	After load (safety net)

CHECK constraints handle the rules that can be expressed as scalar expressions. Post-load tools like dbt tests, Great Expectations or Soda still have a role for things constraints can't do; referential integrity checks, aggregate validations ("revenue should never drop more than 20% day-over-day"), cross-table consistency. The difference: Post-load validation becomes a verification layer, not the primary defense.

How to think about performance

When evaluating the performance impact of Check Constraints and Error Tables, keep the following mental model in mind:

Check Constraints

The runtime overhead is minimal to none for simple scalar evaluations (e.g., numeric checks or string length validations). However, complex expressions that involve semi-structured data manipulation (like JSON parsing) or heavy regex can have a performance impact.

Error Logging

When your source data is perfectly clean (0% errors), the overhead of error logging is negligible. When errors exist in the source data, think of the total execution time as a multitable insert. The engine automatically routes valid rows to the target table and invalid rows to the error table, meaning performance scales with the volume of new partitions being created.

Resources:

CHECK Constraints documentation
DML Error Logging documentation
Cortex Code: error-tables-ops is a bundled skill to assess, analyze, fix, monitor and report on error tables. Read more about Cortex Code Skills here.

Forward-looking statement
This content contains forward-looking statements, including about our future product offerings, and are not commitments to deliver any product offerings. Actual results and offerings may differ and are subject to known and unknown risk and uncertainties. See our latest 10-Q for more information.