JUN 10, 2026/10 min readProduct and Technology

Data Engineering in the AI Era: New Snowflake Tools Built for Smart Pipelines

AI has made it easier than ever to build. However, easier to build is not the same as built to last. If you have brittle, fragile systems, AI is only going to make it worse, not better. That's why you need a platform built to make the most of AI.

At Snowflake Summit 2026, we announced new capabilities that put our customers at the forefront of data engineering today. We've added AI directly into workflows and made it easier to build data pipelines from start to finish. These new features are designed for every type of data engineer. They work where your data lives: in Snowflake, in open and interoperable lakehouses or both. Whether you write SQL, Python or build ML models, everything you need to construct pipelines exists in one place. With Snowflake, you get elastic compute performance that scales, seamless connectivity to data wherever it lives, and enterprise-grade governance capabilities for secure, trusted data with consistent business context.

Faster time-to-production with AI

Figure 1: Snowflake CoCo outperforms generic coding agents for data engineering tasks.¹

With new agentic workflows, AI operates directly within your local environment to build end-to-end solutions. For real data engineering work, Snowflake CoCo sets the bar for leading coding agents. Benchmarks comparing to Claude Code running on Opus 4.7, for instance, show that CoCo uses 51% fewer tokens and takes 8% fewer steps to get the job done.²

Bringing context-aware assistance and purpose-built skills for Snowflake data engineering features, CoCo operates within your security perimeter and crucially understands your enterprise data context. With access to the latest models, like Claude Opus 4.8, Claude Sonnet 4.6 and GPT 5.5, data engineers can use it in Snowsight, through the CoCo CLI or now through a new desktop app (public preview). Use prebuilt or custom skills to migrate Spark pipelines, deploy Python code, automate dbt workflows, optimize performance and more — all from a single prompt.

Autonomous pipelines you can trust

Every organization wants AI-ready data delivered continuously, at low latency, from an ever-growing set of sources. The old way, with handcrafted orchestration scripts, brittle incremental logic and manual deployments, is hard to scale. Declarative workflows let you define what you want — and Snowflake handles how it gets done.

Wolt (part of DoorDash) standardized on Apache Iceberg to give us the flexibility to run each workload on the right engine. We use Snowflake Dynamic Iceberg Tables to enrich, prepare and automatically refresh data in our data lake — we define a single query with a target freshness and Snowflake manages the incremental updates and orchestration. With Dynamic Tables on Apache Iceberg, we have launched pipelines faster, cut maintenance time, and reduced the overhead of our incremental pipelines.

Raimund Kämmerer

Staff Data Engineer, Wolt

Faster, more flexible Dynamic Tables

Dynamic Tables removes hours of manual effort by automating refreshes to your data based on a defined query and target freshness. Dynamic Tables offers leading performance and low latency for incremental pipelines. At Summit, Sergey Labetsik, a senior data engineer at Wind Creek Hospitality, demonstrated how his team was able to deliver food vouchers to guests within a minute of eligibility. By migrating a dbt batch job to a Dynamic Tables pipeline, they cut end-to-end latency to under a minute, a vast improvement from the 30-minute schedule that the job had been running on.

Figure 2: Benchmarks showing up to 2.8x faster refresh performance on Dynamic Tables.

Snowflake announced a series of updates to native declarative workflows to make them more performant, interoperable and expressible, including:

Faster Dynamic Tables refresh performance (generally available): Accelerate workloads by up to 2.8x in a number of areas, including aggregate functions, qualify/rank (SCD-1), cluster-by operations and joins — all measured on Gen2 warehouses.
Custom incrementalization (public preview): Optimize performance for complex transformations by writing your own refresh logic using MERGE or INSERT statements while retaining all the benefits of Dynamic Tables like automatic scheduling, dependency tracking and replication.
Adaptive refresh (public preview): Automatically determine the most efficient refresh method for each cycle. No tuning required. Snowflake systematically chooses between incremental and reinitializations to optimize for cost, prevent failures on complex queries and eliminate manual tuning.
Dynamic Table materialization in dbt (adapter version 1.11.5): Optimize incremental processing by simply changing materialization type in dbt. Composable with other dbt models in the pipeline.
DCM Projects (public preview): Manage infrastructure declaratively by enabling a way to version, test and deploy diverse transformation pipelines on Snowflake.

Bring dbt into Snowflake natively

With dbt Projects on Snowflake, you can use familiar Snowflake features to create, edit, test, run and manage your dbt Core projects. Deploying a dbt Project object gives you built-in observability, CI/CD integration, and removes the infrastructure overhead associated with managing it yourself.

As early adopters of dbt Projects, we worked hand in hand with Snowflake to help shape the roadmap around how our teams actually build and operate. That allowed a lean team to move faster, while creating a more modular, governed and scalable foundation for analytics and for enabling AI across the group.

António Costa

Director of Data Engineering, Aviv Group

With the updates announced at Summit, more customers are standardizing on dbt Projects. They're able to replace the management of dbt Core while gaining access to dbt Fusion and more observability:

dbt Fusion (generally available) is now included as a version with dbt Projects on Snowflake. Provided through our partnership with dbt Labs, any dbt Project can access Fusion, which is designed to improve compilation times for many complex builds.
Enhanced dbt DAG with column-level lineage (generally available) uses Snowflake Horizon Catalog to manage schema-level information directly into a Directed Acyclic Graph (DAG) across Workspaces, object details and Query History. Now, each time you execute a dbt Project object, you can receive a unified data pipeline lineage view.

Programmatic pipelines that scale

Pfizer lowered its total cost of ownership (TCO) by 57% while processing data 4x faster with Snowpark.

Not every transformation fits a declarative model. For data engineers and data scientists who build programmatically with Python, Java, Scala and Apache Spark™, transformations involve jobs, such as complex file parsing, ML inference at batch scale and multistep Python workflows. These users often find that production deployment takes longer than just writing the code itself. But Snowpark and Snowpark Connect for Apache Spark™ are designed to close the distance between prototype and production.

Build and orchestrate Notebooks and ML Jobs

Getting from a notebook to a production pipeline has always been harder than it should be. The new Pipeline Builder (private preview) changes that, letting teams visually connect Notebooks and ML Jobs into a full end-to-end pipeline without writing orchestration code from scratch. Scheduling, infrastructure and object creation are handled automatically, so data scientists and engineers can spend less time on setup and more time on the actual work. The result is faster iteration, fewer handoffs and ML pipelines that are easy to monitor and reproduce in Snowflake.

Build large scale pipelines with Snowpark

Snowpark brings the development experience for Python, Java and Scala natively into Snowflake. Data engineers and data scientists can write and iterate in Notebooks, build transformations with the familiar DataFrame API, package and deploy logic as stored procedures and user-defined functions (UDFs) and schedule it all with Tasks. From the first line of code to production pipelines, Snowpark gives developers a complete, end-to-end workflow where their code runs directly where the data lives, with governance capabilities built in and no external infrastructure to manage.

We've expanded Snowpark across three key areas: developer productivity, external connectivity, and updates targeted for ML and unstructured workloads. Enhanced capabilities include:

Data integration APIs: Pull data from external databases programmatically — DB-API (generally available) supports Python drivers to Oracle, SQL Server, Postgres and MySQL; JDBC-API (public preview) adds server-side parallel reads to any JDBC source.
Unstructured data processing (generally available): Read, parse and enrich files (images, PDFs, audio) at warehouse scale using session.read.file() paired with AI functions like ai.extract(), ai.parse_document() and ai.transcribe().
Artifact Repository (public preview soon): Source Python packages from customer-hosted repositories (Nexus, JFrog) for UDFs, stored procedures and Notebooks — with Private Link support.
Scalable ML batch inference (private preview): Load models once with @udf_init_once and share them across workers for lower memory usage and faster performance on standard warehouses.
Code Bundles for Python and Java deployment (public preview coming soon): Pair seamlessly with DCM Projects to package Snowpark and Snowpark Connect code for reliable, automated deployment alongside the infrastructure it depends on. Together, they give data engineering teams the deployment confidence that software teams have had for years.

With Snowflake, teams move from local Python or Apache Spark code to production-ready workflows with 5.1x faster performance and 42% lower costs on average. ^[3]

Modernize Apache Spark pipelines with Snowpark Connect

Upgrading a data platform shouldn't mean rebuilding everything from scratch. Snowpark Connect gives teams a practical on-ramp, bringing existing Spark-based pipelines onto Snowflake's modern, managed infrastructure without a full rewrite. Engineers can move off aging, expensive Spark clusters and onto a platform built for today's data scale, with native governance, elastic compute and seamless access to Snowflake's full ecosystem. This is modernization that meets teams where they are and eliminates the operational overhead of the past.

Since the launch of Snowpark Connect last year, Snowflake has been hard at work on a number of updates, including:

Spark Scala and Java client for Scala 2.12/2.13 and Java 11/17 with snowpark-submit CLI for zero-code-change production deployment
Bronze layer file processing with permissive mode, complex data types, schema evolution and parallel reads for large compressed files
Unified observability to help discover, diagnose and alert users on Spark jobs with full details (status, duration, resources, queries, logs) from Jupyter, Airflow or external sources

Integrate semantic context into your pipeline

For the last decade, business definitions lived outside the pipeline. Metrics were defined in BI tools, features were defined in ML stores and every team had their own version of the truth. With semantic views, that has been changing. Data engineers can now add meaning directly in the pipeline. With Snowflake Semantic View dbt Package, we're bringing this into dbt workflows. Teams define their semantic layer directly in dbt model files using standard DDL syntax, and CoCo can assist in authoring that definition. Running dbt build materializes or updates the semantic view in Snowflake, keeping it in step with the rest of the pipeline. Horizon Context takes it further, making those definitions available to every AI agent, BI tool and application that touches your data automatically.

New era of data engineering

We've known for years that you can't hire your way out of a systemic problem. Turns out, it's the same with using AI. When data engineers use AI to ship solutions on fragile, legacy platforms, tech debt isn't eliminated but rather accelerated. The result is pipelines that break, infrastructure that's painful to maintain and data products that can't keep pace with the business. In this new AI era, the speed of creation has a danger of outrunning the quality of the foundation beneath it.

Snowflake provides both the agentic coding experiences purpose-built for data engineering in lockstep with a governed platform that AI workloads demand. Whether you're adopting an open lakehouse architecture, migrating Spark workloads, building ML inference pipelines at scale or standing up a brand-new data platform, Snowflake gives every data engineering persona the tools to move faster, ship with confidence and spend less time fighting infrastructure. The agentic era of data engineering has arrived.

To get started, download the free ebook, "Build Pipelines for AI: An Essential Guide to Smarter Data Engineering," and read more about the exciting releases and announcements from Snowflake Summit 2026.

Based on ADE Bench Results compared with Claude Code. ↩
Note: Efficiency score based on internal testing using ADE-bench, a framework created by dbt for evaluating AI agents on agents on real-world analytics and data engineering tasks. ↩
Based on customer production use cases and proof-of-concept exercises comparing the speed and cost for Snowpark between November 2022 and May 2026. Actual speed and cost improvements depend on specific customer environments and workload patterns. ↩

Learn more about the authors

Abhishek Kashyap

Director of Product Management

Jena Donlin

Product Marketing Lead

Data Engineering in the AI Era: New Snowflake Tools Built for Smart Pipelines

Faster time-to-production with AI