Develop Agency over Your Data in the Age of AI

The promise of the open lakehouse envisions a single, governed data copy that’s accessible to any engine, but this idea has long been haunted by "proprietary gravity." And, while Apache Iceberg™ emerged as the community’s first answer to data interoperability, an open format alone is no longer enough.

In the age of AI, data silos, governance and semantic fragmentation are taxes on innovation. When teams cannot act on data where it lives, they are forced to move it, leading to ballooning costs and "noisy" data that is missing the rich, semantic context that AI needs. AI initiatives are undermined before they even get started.

At Snowflake, we are building toward a future where full interoperability is a reality. By working with the community across data, governance and semantic interoperability, we are enabling our customers to overcome data silos and multilayer fragmentation once and for all.

The result is users who have agency over their data. Users decide how and from where to securely act on a single logical data copy for any operation without impacting governance controls and semantic context.

Architecting for agency over data

Agency over data can’t be accomplished by a single vendor or with just data interoperability, however. It requires interoperability at each layer of an architecture. Delivering on this vision means solutions must be grounded in widely accepted open and community-driven initiatives that prioritize vendor-neutral interoperability.

Data interoperability

Getting to a place where users have agency over their data, regardless of engine, starts with a common table format. With its widespread native support across platforms and active community, Iceberg is that format. Most recently, the community reached a critical milestone: Iceberg v3. Iceberg v3 builds on existing capabilities to expand data interoperability to critical use cases, including semi-structured data, change data capture (CDC) and more.

Today, as we gather for Iceberg Summit in San Francisco, we are excited to announce general availability soon for broader support for v3 capabilities.

Iceberg v3 supported use cases

By supporting a broad set of v3 capabilities, more of our customers’ data becomes accessible from more engines than ever before. Customers can power the following use cases with Snowflake for Apache Iceberg tables, managed by Snowflake’s Horizon Catalog or any other catalog:

  • VARIANT data type: Allows for semi-structured data inside an Iceberg table with the potential to use shredding, giving structured performance with semi-structured flexibility.
  • Row lineage: Powers row-level CDC by tracking modifications even across multiple engines.
  • Deletion vectors: Provides a more performant iteration of row-level deletes that also alleviates significant maintenance difficulties associated with position delete files.
  • Nanosecond-precision time stamps: Supports greater precision for time stamps, common in high-frequency financial, event telemetry or Internet of Things data.
  • Geospatial type: Natively stores and prunes on geometric information.

Breaking transactional silos with pg_lake

Not every data set starts in an analytical lake. Much of a company's most valuable information lives inside transactional databases such as Postgres. Historically, the two worlds of transactional and analytical were silos. To get them to talk to each other, teams had to glue them together with data pipelines that moved data downstream.

To bridge this gap, Snowflake developed and open sourced pg_lake. This extension transforms Postgres from a standard database into a functional part of a data lakehouse. pg_lake gives databases two new capabilities:

  • It can query data in place — it lets Postgres read files like Parquet and CSV directly from your data lake, without a complex loading process.
  • It can natively manage Iceberg tables, using Postgres itself as the management layer.

Now transactional and analytical data can share the same open language.

Governance portability: Apache Polaris™

Governance controls and secure access must follow the data. This is why, two years ago, we open sourced and donated an Iceberg catalog, now Apache Polaris, and have partnered with the community to help the open source catalog become a Top-Level Project under the Apache Software Foundation. Our aim is to deliver a future where Snowflake's fine-grained access controls, or those of any other platform, are enforced consistently and performantly across any engine, on any compute, without forcing customers to choose between security and the flexibility of an interoperable lakehouse.

Historically, authorization has been hard-coded into database engines, which has locked customers in at two levels: policy definition and policy execution. However, the issue isn't that customers don't trust these engines to enforce rules — they do, and they always have — but rather that fine-grained access control (FGAC) requires compute to understand and execute those rules.

We are breaking this cycle with Apache Polaris. By developing standards for Policy Exchange, Governance Federation and Read Restriction APIs, we’re creating a standardized way to interchange policies and a trust mechanism to manage enforcement across platforms. By using Read Restriction APIs, one platform can share pre-evaluated access rules that a downstream engine can enforce directly. This ensures that governance truly travels with the data, removing the heavy “compute tax” of data materialization and allowing for consistent enforcement regardless of which engine is accessing the information.

The goal is simple. Fine-grained security and governance controls — whether on Snowflake’s Horizon or any other supported catalog — should be enforced consistently across any engine, without server-side materialization or performance penalties.

Semantic context: Grounding AI with OSI

AI agents waste tokens and "guess" meanings when business logic is locked in proprietary silos. To address this, we're building Open Semantic Interchange (OSI), a vendor-neutral specification for metrics, dimensions and relationships that makes semantic context as open and interoperable as Iceberg itself. The first OSI spec is live under an Apache 2 license, backed by a coalition of more than 35 industry leaders, including Salesforce, dbt Labs and Databricks, with a commitment to transition to foundation-led neutral governance.

Snowflake customers can start today with semantic views in the Horizon Catalog, giving Snowflake Cortex AI and agentic applications the governed "map of truth" they need to reason accurately, while building on the same foundational constructs that OSI is standardizing across the industry.

Becoming more open

Our commitment to unlocking agency over a user’s data represents a fundamental shift in our engineering culture. Snowflake is no longer just a consumer of open source; we are building with the community. We are proud of how this change has enabled us to work with the community to make agency over data a reality for all.

  • 9,000+ contributions: Over the last two years, our engineers have authored thousands of commits and pull requests to open source projects.
  • Operational transparency: We are building in the open and submitting proposals, such as collations in Iceberg, to gather public feedback and build consensus through the community.
  • Iceberg v4: We are already active on the next frontier, collaborating on core metadata redesigns to reduce latency for streaming workloads including single file commit/adaptive metadata tree, enabling Parquet manifests and indexing improvements.

The future belongs to everyone

For true open data interoperability to become a reality, we all must play our parts; this is, after all, a collective responsibility. This means moving beyond "proprietary gravity" because that is what the age of AI demands.

No single vendor can solve data silos and fragmentation alone. It requires a diverse community of users, vendors and organizations working toward this common goal. Only then can we help data teams everywhere realize the promise of open source: the ability to have agency over their data.

If you are at the Iceberg Summit, come find the Snowflake engineers writing the PRs and reviewing the spec proposals. The work is public, the doors are open, and a future where users have agency over their data belongs to everyone.

Subscribe to our blog newsletter

Get the best, coolest and latest delivered to your inbox each week

Where Data Does More

  • 30-day free trial
  • No credit card required
  • Cancel anytime