Blog/Product and Technology/Build the Interoperable Lakehouse: Agency Over Your Data
JUN 02, 2026/13 min readProduct and Technology

Build the Interoperable Lakehouse: Agency Over Your Data

AI is testing every architecture decision. When teams can't act on data where it lives, they copy it. Pipelines sprawl, governance fragments, costs compound, and AI agents end up reasoning over stale, disconnected data instead of the governed, semantically rich data they need.

The open lakehouse promised to solve data fragmentation without forcing everyone onto a single platform. But for most organizations, the format arrived before governance and semantic fragmentation could be addressed. That changes today. Snowflake's Interoperable Lakehouse, built on Apache Iceberg™, Apache Polaris™ and Open Semantic Interchange (OSI), is generally available. It offers a new blueprint for connecting, accessing, governing and operating on a single governed copy of your data, wherever it lives and without lock-in. By giving control back to data owners, not vendors, you can create agency over your data, and in the process cut architectural cost and ground every AI initiative in a foundation you can actually trust.

Act on data in place

Agency over your data starts with a connected data foundation — one place to act on every data set, for any operation, without copying it. With this launch, Snowflake advances that foundation across every layer of access. Snowflake's support for Apache Iceberg v3 is generally available and production-ready, providing the broadest set of v3 capabilities on the market today that are deeply integrated throughout the platform to unlock greater interoperability. Snowflake Storage for Apache Iceberg™ tables makes managed Iceberg as easy as CREATE TABLE. Zero-Copy Integrations bring your systems-of-record into the foundation with semantics intact. Horizon Context connects the business definitions every team and AI agent runs on. More data. More context. One governed copy.

Apache Iceberg was originally designed for huge analytical datasets, but it had suboptimal support for workloads involving semi-structured data, small updates, geospatial analytics, and change-tracking pipelines. Apache Iceberg v3 closes that gap. As of today, Snowflake brings the broadest set of v3 capabilities to production, including VARIANT support for semi-structured data, row lineage for change tracking across engines, deletion vectors for performant row-level deletes, nanosecond timestamps for high-frequency telemetry and financial workloads, default values and geospatial types. More workloads now have a clean path to interoperability.

A capable format, however, does not eliminate the operational tax of managing storage. Snowflake Storage for Apache Iceberg™ tables, generally available for AWS and Azure and private preview soon for Google Cloud, delivers a fully managed Iceberg experience: open from the start, governed through Horizon Catalog, readable and writable by any Iceberg-compatible engine. For teams managing their own storage on Azure, Azure DFS support is generally available, delivering full interoperability through native Azure Data Lake Storage Gen2 endpoints.

Figure 1: Introducing Snowflake Storage for Apache Iceberg™, now generally available.
Figure 1: Introducing Snowflake Storage for Apache Iceberg™, now generally available.

Bringing existing data in shouldn't require migration or conversion. Parquet Direct, in private preview with general availability coming soon, makes existing Parquet files queryable with Iceberg-class performance. Google Cloud Lakehouse integration is generally available, creating Catalog Linked Databases for Google's cross-cloud lakehouse environment with automatic table discovery and cross-cloud read and write access. Just-in-time refresh for externally managed Iceberg, in private preview, detects stale metadata at query time and refreshes it automatically, doing away with the need to configure scheduled refreshes.

Enterprise platforms are where the most valuable enterprise data lives — and where the pipeline tax has always been heaviest. Zero-copy integration makes critical business data available in your Snowflake ecosystem in near real time without ETL pipelines or the need to rebuild semantic context. These exist now for SAP (GA), Salesforce, Workday (private preview), and new partnerships with AVEVA and IBM will extend this model further — operational technology and industrial data from AVEVA CONNECT, and enterprise data platforms from IBM — bringing business definitions and context together for more consistent, AI-ready data.

Having connected systems doesn't necessarily translate into connected meaning. Revenue, churn and customer counts still mean three different things in three different places until the definitions themselves live in one connected layer. Horizon Context is that layer. It links scattered business definitions across databases, data lakes and BI tools so that every team inside and outside of Snowflake (and AI agents) reason from the same definition of enterprise truth. Connect to external database, BI and data pipeline systems, including PostgreSQL, Microsoft SQL Server, Tableau, Microsoft Power BI and dbt and enrich metadata with schemas, query logs, dashboard definitions and more (in private preview). Horizon Context enables this foundation through a set of integrated capabilities:

  • Out-of-the-box connectors: Connect to tools such as PostgreSQL, Microsoft SQL Server, Tableau, Microsoft Power BI and dbt that allow you to gather rich context — query logs, popularity, schemas and more — from many sources into one searchable catalog.
  • End-to-end column-level lineage: Lineage is key to understanding how data assets are related to one another. Horizon Context mines lineage information from Snowflake and external database query logs, BI systems and OpenLineage feeds, and stitches it all together to create a complete, end-to-end lineage graph.
  • Semantic Studio, in private preview, is an AI-assisted IDE within Snowflake Workspaces that lets teams define, test and publish shared business logic without SQL expertise, with Snowflake CoCo integration and Git sync for version control.
  • Semantic View Autopilot (generally available) analyzes existing query patterns to automatically generate and refine semantic views, helping ensure your context layer stays current as your data and usage evolve. CoCo now retrieves business context for search, SQL generation and complex analysis, generally available.
  • And through the Open Semantic Interchange (OSI) those definitions travel beyond Snowflake to the broader BI and AI ecosystem with 54 participating vendors and a published specification.

Asking a question of your data should just work. With a connected, interoperable foundation underneath, it does. Agentic Queries (generally available) lets your teams ask questions in natural language across Snowflake, data lakes and, in private preview, external relational systems. Horizon Context returns the governed answer almost instantly.

That's just the starting point. Shared data, including in open formats, should also be just as conversational. Auto-gen Agents for Data Shares and Listings, in public preview, instantly generate a Semantic View and Agent from any data listing or secure data share without manual development. Cortex Agent Sharing, in public preview, then deploys that agent across Snowflake accounts to internal teams, partners or the broader ecosystem via Marketplace. Together, these capabilities unlock new audiences and use cases for the same governed data sets through a conversational experience. Consumers can even combine shared data with their own first-party data for richer insights, all governed out of the box.

Universal governance

Acting on data in place only addresses half the problem. The bigger issue becomes obvious the moment you build for it: who governs your data, where and how. Multi-catalog environments fragment policies. Multi-engine access multiplies the challenges, eroding agency over your data with every workaround. What if you only needed to set access policies once in one universal catalog? We are excited to announce new capabilities in Horizon Catalog (based on Apache Polaris™) that help connect your entire Iceberg ecosystem. Now, you can govern not just Snowflake-managed Iceberg but every Iceberg table in your estate. Universal governance, set in Horizon, is honored on every IRC compatible engine and without lock-in.

It starts with delivering an interoperable foundation ready for production. Now, both read and write access from external engines are generally available in Horizon Catalog, providing full bidirectional interoperability via vended credentials, the open security mechanisms defined in the Iceberg REST protocol, to Snowflake-managed Iceberg tables. Spark, Trino, PyIceberg and any compatible engine can read and write on the same governed copy your Snowflake users do. One catalog, one set of policies, no trade-off between using your preferred engines and keeping governance policies in one place.

When most enterprises have several catalogs, setting uniform governance controls is costly and complex. Implementing a universal governance forces a choice between costly migrations or shipping the complexity and operational cost to your data teams by duplicating governance, auditing and monitoring controls across every catalog. This forced choice erodes control over your data. Last year, based on the principle of acting on data in place, we launched Catalog-linked databases (generally available) to automatically discover and securely read and write to all your external Iceberg tables from Snowflake. This year, we extend that principle to include governing data-in-place, eliminating the need for forced migrations. Now, in private preview, you can also manage secure engine access to these external Iceberg tables using Horizon Iceberg REST Catalog APIs for both read and write operations, evolving Horizon Catalog into a universal governance layer for all Iceberg tables. You gain comprehensive governance capabilities, auditing and observability in one place for all operations from any engine.

Another common reason behind catalog fragmentation is that fine-grained access controls have been limited to the catalog associated with a single engine. This limitation increases the operating burden of managing a multi-engine environment for your data teams, raising the risk of a misconfigured policy causing a data leak. Now, support for the Iceberg REST Scan Plan API (in private preview) eliminates this restriction. With this capability, fine grained access policies follow the data wherever it is queried, allowing row-access and dynamic data masking policies defined in Horizon Catalog for Snowflake-managed Iceberg tables to be enforced when accessed from external engines. Lastly, the new Snowflake Connector for Apache Spark (generally available) enforces these policies for teams already running on Spark, providing a production-ready solution today.

We are extending the reach of Open Data Sharing enabling customers to share federated catalogs using Catalog Linked Databases (generally available soon). We are also announcing that Open Data Sharing has been enhanced (public preview) so any IRC compatible external engine can consume all data shares without needing a Snowflake account. When combined, these two capabilities empower customers to use any external engine to securely access any open table format that's accessible through Horizon.

Policies stay enforced because the connections themselves are secure. Private link to external catalogs and storage is generally available, keeping data off the public internet when Snowflake connects to external lakes.

This works because the standards underneath are open. Apache Polaris is now a Top-Level Project under the Apache Software Foundation, and Snowflake engineers contributed the Scan Planning API specification to the Apache Iceberg project. Universal governance becomes an ecosystem solution, not just a Snowflake feature.

Enterprise-ready by default

Acting on data in place and governing it universally is the architecture. Running it in production is your team's responsibility. Most lakehouse architectures hand that responsibility back to the architect: health checks to instruments, audit logs to reconcile across engines, resilience to bolt on. Today, that operational burden disappears. Comprehensive auditing in Access History, in private preview, logs every external engine operation directly within Snowflake's Access History, giving compliance and security teams a single, connected record of all table operations at the user level, regardless of the engine used or table accessed. Operational health monitoring for externally managed Iceberg tables in catalog-linked databases, in private preview, surfaces freshness and refresh issues before they reach production. And Managed Iceberg replication, generally available soon, makes the same open foundation resilient against outages by default. Enterprise-ready, without the integration project.

Compliance teams have always had to reconcile audit logs across engines. Comprehensive auditing in Access History, in private preview, logs every external engine operation directly within Snowflake's Access History ends that work. Every access event lands in a single defensible record: who accessed what, where and when. Architects can answer the audit in one place.

Iceberg Health Insights in Snowsight, in private preview, gives platform teams a connected operational view of their externally managed Iceberg estate — auto-refresh status, table discovery and freshness signals — without toggling between cloud consoles or building custom monitoring. When a catalog-linked database surfaces stale metadata or a refresh pipeline stalls, teams see it in one place and resolve it before downstream queries return outdated results. As this capability expands toward general availability, it will extend across the full Iceberg estate (Snowflake-managed and external alike) delivering the operational confidence that production lakehouse architectures demand.

Figure 2: Instantly monitor your external table’s health by diagnosing table linking and refresh issues in a single dashboard.
Figure 2: Instantly monitor your external table’s health by diagnosing table linking and refresh issues in a single dashboard.
Figure 3: Troubleshoot your table refresh issues with a single click with Cortex Code or drill down into actionable error details in Snowflake’s Refresh Issues view.
Figure 3: Troubleshoot your table refresh issues with a single click with Cortex Code or drill down into actionable error details in Snowflake’s Refresh Issues view.

Resilience belongs in the foundation, not in a separate project. Snowflake's Managed Iceberg replication and failover, coming soon to GA, extends account replication and failover to Snowflake-managed Iceberg tables, helping teams make their open data foundation more resilient to outages. Resilience gets even stronger with Optimized Refresh, a new replication feature for failover groups now in public preview. Built on Snowflake's next-generation log-based replication engine, optimized refresh tracks changes as they happen and applies only what needs to be updated. Preview customers experienced between 1.6x and 22x faster replication performance, helping teams reduce Recovery Point Objective (RPO) targets for mission-critical workloads while maintaining predictable costs based on the volume of data replicated.

With these capabilities built into the Snowflake platform, teams can fail over data, applications and pipelines with minimal operational friction and without rearchitecting their environments. That gives organizations the confidence to go all-in on Iceberg without sacrificing the operational resilience their critical workloads require.

Agency over your data

The open lakehouse promised that data would move less and work harder. But for most enterprises, openness ended at the table format. Governance fragmented, semantics siloed, and every production requirement still demanded a custom project. AI made this governance and semantic fragmentation impossible to ignore. Agents that reason over stale, disconnected data erode trust in the very systems your teams are building.

The Interoperable Lakehouse provides what the format alone could not: interoperability at every layer, from storage to governance to semantics, in a connected foundation where each reinforces the others. What does this mean in practice? Your engineers choose the right engine for each workload without duplicating data. Your governance team defines policy once, and it holds across Snowflake, Apache Spark, Trino and more. Your Iceberg estate is observable, auditable and resilient without a separate operations project. And your AI initiatives run on governed, semantically rich data from day one.

This is real agency over your data. Not a slogan — an operating principle. Design your architecture around what your business requires and AI demands, not what your vendor allows.

The interoperable foundation is here.

Build on it.

To start reclaiming agency over your data, visit Snowflake's Interoperable Lakehouse page and explore Snowflake's offerings. Learn more by downloading the free ebook, "Building the Interoperable Lakehouse: Data Strategies for AI Leaders," or watching the TDWI webinar, featuring Snowflake Director of Product Management James Roland-Jones. Then, get hands-on with this virtual lab, "Build a Multi-Engine Stack on Snowflake Storage for Iceberg and Horizon Catalog."


Forward-looking statements

This content contains forward-looking statements, including about our future product offerings, and are not commitments to deliver any product offerings. Actual results and offerings may differ and are subject to known and unknown risk and uncertainties. See our latest 10-Q for more information.

Subscribe to our blog newsletter

Get the best, coolest and latest delivered to your inbox each week

Where Data Does More