Core Platform

Extend Unified Governance Across the Iceberg Ecosystem with Horizon Catalog

Today we are making it easier to govern your data estate consistently by enforcing data protection policies such as row access and column masking on Apache Iceberg tables accessed through Apache Spark using the Snowflake Connector for Spark. 

The challenge: Governing the fragmented data estate

Organizations today manage increasingly fragmented data estates, with sensitive information spread across cloud storage and open table formats. Apache Iceberg has become a popular choice for open, flexible data storage — but while the format is powerful, its open source implementations lack native support for enterprise-ready data protection, including fine-grained access controls such as row access policies and column masking. 

Within Snowflake, customers can enforce these protections natively, including for Spark workloads executed through Snowpark Connect, where Spark APIs run directly on the Snowflake engine. However, for customers accessing regulated or sensitive data through open source or third-party-managed Spark environments, this gap can mean increased regulatory risk, privacy exposure and operational inefficiency. They expect consistent enforcement of data protection policies regardless of how or where the data is accessed.

Open source: Leading the effort to close the interoperability gap

The open source community is working to solve this governance challenge. Efforts are underway to define interoperable-governance primitives that can span engines, catalogs and execution environments, enabling consistent policy enforcement across an increasingly diverse data ecosystem. The Snowflake team is actively contributing to these open source efforts to shape the next phase of Iceberg interoperability in partnership with the broader community. Snowflake engineers Prashant Singh and Russell Spitzer drove the development of the Iceberg REST Scan Planning API, collaborating with Iceberg committers and contributors to lay a foundation for fine-grained data protection beyond Spark. They are also working with the rest of the Iceberg community to define table-level read restrictions to standardize policy enforcement across engines. Together, these efforts move the ecosystem closer to a truly interoperable governance model. While these capabilities require continued open-source development, Snowflake is committed to leading this work and delivering secure, interoperable governance to customers as early as possible.

The reality: The high cost of inconsistent governance 

Even with this progress, most open source implementations still lack out-of-the-box support for enterprise-ready data protection, such as row access and column masking policies. As a result, customers working with Iceberg today must still manage the gaps in governance themselves. Without built-in controls, data owners and stewards must rely on alternative approaches, such as maintaining multiple filtered or masked copies of the same data sets for different users; or building complex, engine-specific views to simulate data protection requirements. These approaches not only add cost and operational overhead, but they also undermine trust, both in the data and in how securely it is being accessed. At Snowflake, we believe customers should not have to choose between open, interoperable access to their data and strong governance controls. Instead, they need a way to access Iceberg tables from Spark while maintaining consistent policy enforcement and optimizing compute resources across their broader data ecosystem.

The release: Extending Snowflake data protection policies to Apache Spark 

Today we’re making that possible. The Snowflake Connector for Spark already supports enforcement of data protection policies on Iceberg tables. This release builds on that foundation by introducing support for the Apache Iceberg REST Catalog (IRC) client within the connector, unlocking more flexible and efficient execution for Spark workloads. With this release, Spark reads (and, in the future, will write to) Iceberg tables that do not have data policies directly, using its own compute through the Horizon Iceberg REST Catalog, while reads and writes to policy-protected tables are automatically routed through Snowflake to allow consistent policy enforcement.

This routing happens transparently within the connector, requiring no changes to user queries. As a result, customers can access Iceberg tables with and without row and column masking policies in a single query, using ​​the most appropriate compute path for each table, regardless of whether Spark users authenticate with Snowflake-native credentials or External OAuth. This approach helps organizations maintain both strong governance controls and efficient resource utilization, without impacting openness or interoperability.

Achieving native and cross-engine governance in Iceberg requires community alignment on several foundational capabilities, such as interoperable policy definitions, secure policy context sharing and consistent enforcement semantics. These areas are still evolving in open source, and meaningful progress depends on close collaboration across vendors and the community.

Snowflake is committed to providing customers with interoperable governance capabilities. This release is the first concrete step in that direction. Through the Snowflake Connector for Spark, teams can apply these protections today, even as the broader ecosystem continues to evolve. 

The Snowflake Connector for Spark is open source, and we’re committed to working closely with customers and the open source community to incorporate feedback and continue improving these capabilities. The connector gives organizations a production-ready option to use Iceberg with Spark while fully utilizing Snowflake’s row and column masking policies. It closes a critical interoperability gap and supports customers to adopt future enhancements smoothly as the community ratifies new capabilities.

Get started

You can use the Snowflake Connector for Spark with your preferred Spark environment, whether it’s self-hosted open source Spark, or any other third-party managed Spark offering that supports the connector. To get started, download the latest connector package and follow the setup instructions in the documentation.

Subscribe to our blog newsletter

Get the best, coolest and latest delivered to your inbox each week

Where Data Does More

  • 30-day free trial
  • No credit card required
  • Cancel anytime