JUN 02, 2026/8 min readProduct and Technology

Simplify the Entire Data Development Lifecycle

By the time most enterprise data reaches the systems that are meant to act on it, it's often stale. That lag is the difference between AI agents making a useful answer or a costly mistake. Agentic AI can only deliver intelligent decisions when it has continuous access to fresh information.

The demand on data engineering teams has shifted toward real-time pipelines and event-driven architectures as more organizations push agentic AI into production — highlighting the need to connect and govern more sources even as they undergo constant change. But what teams are being asked to build has outpaced what their data platform can currently support.

At Summit 2026, Snowflake is strengthening the platform to help data engineering teams succeed in the AI era. This includes notable releases like a native Apache Kafka-compatible streaming service and AI-powered capabilities that reduce data movement and migration costs.

These improvements reduce the time data engineers spend on infrastructure management and manual orchestration, enabling them to spend less time on plumbing and more time on the outcomes that AI makes possible with Snowflake CoCo serving as the common thread that turns complex setup into a guided conversation.

Stream data at the speed AI demands

Agents observe, decide, act, learn and feed that learning into the next decision. Each decision loop should make the next response more accurate, more personal and more actionable. That cycle runs continuously, which means the data feeding it has to flow continuously too. Organizations running Kafka already have the streaming backbone this cycle demands. The problem is that operating it alongside a separate analytics platform means paying for, governing and staffing two separate systems, while the data still arrives late to the place where decisions actually happen.

Datastream (private preview soon) is Snowflake's native, Apache Kafka-compatible streaming service that is designed to collapse that operational overhead into a single governed platform. Data lands continuously as native Snowflake or open Apache Iceberg™ tables, queryable in seconds. Topics are secured with Snowflake role-based access control (RBAC), and tables inherit the full power of Horizon Catalog, including classification, lineage and masking policies. Data is governed the moment it arrives. Simply describe the streaming pipeline you need, and CoCo will handle Datastream's authentication and onboard teams in minutes without needing deep Kafka expertise.

Datastream is purpose-built for organizations that want to replace their Kafka infrastructure with a native Snowflake service. Snowpipe Streaming High-Performance Architecture is a direct ingestion API for teams streaming data from their own applications, including from existing Kafka clusters via the Kafka Connector. Today, Cboe Global Markets, a financial exchange market operator, processes 190 billion rows of market data daily and queries it in under 30 seconds to give traders and analysts real-time visibility into market activity. At Summit, enhancements to Snowpipe Streaming include:

Kafka Connector 4.0 (generally available) offers server-side ingestion up to 10 GB/s per table and reduces client-side resources by up to 30%¹, so teams scale throughput without sacrificing cost.
Error logging (generally available) captures failed rows in a SQL-queryable table with full context, so teams catch data quality issues before agents act on bad inputs.
With multi-language SDK support (generally available), teams stream from their familiar stack including Java, Python, Node.js and a REST interface.
Elastic Channels (private preview) enables thousands of clients to concurrently stream gigabytes per second to a table through a shared, auto-scaling endpoint, reducing the development time to build and scale streaming pipelines.
Durable Acknowledgments (private preview) removes the window of potential data loss between ingestion and commit. Mission-critical pipelines never feed agents incomplete data.

Pipelines that manage themselves

Getting data into Snowflake in real time is only half the job. The other half is turning that raw stream into something analysts, models and agents can actually consume. That transformation layer needs to run continuously, handle its own retries and refresh logic, and stay reliable without a dedicated engineer constantly watching over it. Teams move faster when pipelines manage themselves.

At Summit, Sergey Labetsik of Wind Creek Hospitality demonstrated how migrating a dbt batch job — previously running on a 30-minute schedule — to a Dynamic Tables pipeline cut end-to-end latency to under a minute, delivering food vouchers to guests the moment they earned them.

And this declarative path has gotten faster and more flexible. Performance enhancements (generally available) offer up to 2.8x faster refresh for common Dynamic Table workloads². Custom incrementalization (public preview) lets engineers use MERGE or INSERT statements for transformations that cannot be expressed declaratively, while retaining full Dynamic Tables automation.

DCM Projects (public preview) give teams a single, controlled workflow to declaratively define infrastructure, preview and deploy changes across environments and keep a full audit trail of every deployment. dbt Projects on Snowflake, now even faster with Fusion support (generally available), extends that same philosophy to dbt users.

CoCo skills for Snowpipe Streaming, Dynamic Tables and dbt Projects accelerate setup and troubleshooting across these workflows, letting engineers focus on pipeline logic instead of boilerplate.

Access enterprise data with business semantics, without moving it

Some of the highest-value data in any organization never needs to move at all. It already lives in key enterprise platforms such as SAP, Salesforce and Workday with business meaning, relationships and semantic models baked in. For executives trying to get AI initiatives into production, that reconstruction cost is often the single largest blocker.

Rather than replicating data, Zero-Copy Integrations surface the source system's intelligence directly in Snowflake: governed, query-ready and with the semantic richness that AI workloads need to perform reliably. Models and agents operate on data that retains its original business context rather than stripped-down table replicas.

SAP is now generally available through SAP BDC Connect for Snowflake, delivering bidirectional, zero-copy integration. Data engineers access SAP ERP data for AI, analytics and data engineering without complex ETL, while enriched insights flow back into SAP to trigger automated actions. Salesforce Data 360, a pioneer in native zero-copy integration with Snowflake, delivers an enhanced connector experience that enables customers to share data bidirectionally with zero pipeline maintenance. Workday enters private preview, surfacing people and finance data as externally managed Iceberg tables with incremental change capture at the storage layer.

Across all three, the architecture is consistent: Data resides in the source system, surfaces in Snowflake through Catalog-Linked Databases and inherits the full Horizon governance perimeter. End-to-end lineage, access policies and audit trails apply from the moment data becomes visible. What's more, CoCo skills handle lifecycle management so teams configure and maintain connections through natural-language prompts, making enterprise data integration accessible to any Snowflake user.

Connect what remains with Snowflake Openflow

Zero-copy works for enterprise platforms that have invested in native integration paths. But plenty of critical data still lives in on-premises online transaction processing (OLTP) databases, SaaS applications and legacy systems that were never designed to share.

Since launching last year, Openflow, Snowflake's managed data integration service, powered by Apache NiFi, has seen growing customer adoption as teams consolidate fragmented connector stacks into a single platform. That momentum is driving a significant scope expansion at Summit.

Snowflake's managed deployment is now generally available on Google Cloud Platform, joining AWS and Azure. The Data Connectivity Proxy (generally available soon on AWS) extends Openflow into private networks, connecting sources that previously required custom engineering to reach. Openflow supports structured and unstructured data, batch and streaming, and remains open and extensible. Teams build custom connectors and run them on Snowflake's managed platform without sacrificing control.

A guided setup wizard in Snowsight walks through connector installation step by step with built-in source-connectivity validation, making it easy to go from setup to ingesting data in minutes. When connectors surface errors, AI-assisted troubleshooting, powered by CoCo and embedded directly in the Connector Monitoring Dashboard, analyzes logs and delivers targeted remediation steps across the growing Openflow library, including newly added high-demand connectors like Veeva, BigQuery and MongoDB (all in public preview). These connectors use AI-assisted customizability to accelerate deployment and provide deeper visibility into specialized industry data.

Build and deploy at scale with Snowpark

Not every transformation fits a declarative model. For data engineers and data scientists who build programmatically with Python, Java, Scala and Apache Spark™, transformations involve complex file parsing, ML inference at batch scale and multi-step Python workflows. The challenge is that production deployment can take longer than writing the code itself. Snowpark closes that distance between prototype and production.

Key releases at Summit include optimized ML batch inference (public preview) for faster, more efficient scoring at scale; expanded Snowpark Data Integration APIs with JDBC support (public preview) to reduce the work needed to bring external data into Snowflake; File transform for Apache Spark (public preview soon) for large and complex file ETL; Snowpark Directory Import (generally available) for simpler multi-file Python project deployment; a visual DAG pipeline builder for orchestrating Notebooks and ML Jobs (private preview); and Code Bundles for deploying Python and Java code in production (public preview soon).

CoCo skills for Snowpark Python and Apache Spark further reduce the friction of deploying and migrating these programmatic pipelines, helping teams move from local Python or Apache Spark code to production-ready workflows with 5.1x faster performance and 42% lower costs³.

Set your target: Snowflake AIM handles the rest

Nothing slows a data team down quite like the weight of what it has inherited. Legacy ETL tools, aging SQL dialects, Oracle schemas that predate the current engineering team — migration projects have a well-earned reputation for running long, going over budget and introducing risk to workloads that were running fine, until they weren't. Many organizations end up maintaining the old stack in parallel with the new one, doubling both cost and management effort for months, if not years.

Snowflake AIM (AI-powered Migration), now generally available, is a unified migration, modernization and virtualization platform that combines IP from SnowConvert AI, the Snowpark Migration Accelerator and Datometry. A Snowflake AIM migration agent, available through Snowflake CoCo, walks teams through the end-to-end journey: It paints a clear, dependency-aware picture of what needs to move, in what order and with what levels of risk, before anyone touches production. Processes that once took weeks or months now happen in a fraction of the time.

The data engineer as outcome architect

The pattern across all announcements is the same: Reduce the time engineers spend keeping systems running so they can spend more time on results that matter. The tasks that once consumed engineering cycles, from connector maintenance to pipeline debugging, are becoming faster and simpler to handle with every release, and CoCo is the thread that ties them together.

In that environment, the data engineer's role only grows. The job becomes less about plumbing and more about architecting the data foundation that AI actually runs on. Snowflake remains committed to making the complex invisible so data teams can focus on what they now make possible.

Interested in Datastream? Express your interest
Demo: Using Cortex Code with Snowflake Openflow
Demo: Build High-Performance AI Pipelines with Real-Time Streaming and CoCo
Demo: Deploy Python Pipelines with CoCo in Under 2 Minutes
Demo: Migrate PySpark Code to Snowflake with CoCo
Snowflake Connect: Data Engineering | Building Transformation Pipelines for AI-Ready Data

Customers are reporting up to 30% lower client-side resource costs using the Snowpipe Streaming High-Performance Architecture. See more here: Scaling Streaming at Snowflake: Introducing Our Next-Gen Snowpipe Streaming Architecture
Snowflake performance improvements based on an internal transformation workload measured as of May 4, 2025, using Standard Warehouse and May 4, 2026, using Gen2.
Based on customer production use cases and proof-of-concept exercises comparing the speed and cost for Snowpark versus managed Spark services between November 2022 and May 2025. All findings summarize actual customer outcomes with real data and do not represent fabricated datasets used for benchmarks.

Learn more about the authors

Simplify the Entire Data Development Lifecycle

Stream data at the speed AI demands

Pipelines that manage themselves

Access enterprise data with business semantics, without moving it

Connect what remains with Snowflake Openflow

Build and deploy at scale with Snowpark

Set your target: Snowflake AIM handles the rest

The data engineer as outcome architect

Learn more about the authors

Maria Ho

Saptarshi (Sap) Mukherjee

Lauren Delgado

Subscribe to our blog newsletterGet the best, coolest and latest delivered to your inbox each week

Subscribe to our blog newsletter
Get the best, coolest and latest delivered to your inbox each week