Snowflake Storage for Apache Iceberg™ Tables: Snowflake Simple Interoperability

The promise of an "open lakehouse" has always been about choice — about giving every team the ability to use their preferred engine, whether that’s Snowflake or not.

But as organizations adopt Apache Iceberg™ as their interoperable data format, a new bottleneck has emerged. While the data format is open, the storage bucket often remains "self-managed." This introduces a hidden operational tax: Teams are spending too much time setting up, managing cloud bucket policies and performing risky storage maintenance.

Today, we are excited to announce that Snowflake Storage for Apache Iceberg™ tables on AWS and Azure is publicly available. This release delivers the best of both worlds: the full interoperability of Apache Iceberg with the built-in resiliency, performance and zero-management experience of Snowflake storage.

Eliminating the self-managed storage burden

For years, Snowflake customers have enjoyed the simplicity of storing data in Snowflake. You don't worry about where the files live, how they are encrypted or how the metadata is tracked. It "just works."

However, as multiengine requirements grow, many architects feel forced into self-managed storage architectures to make their data accessible to external tools. This shift often comes with a steep learning curve. In a self-managed environment, the data engineer is responsible for the heavy lifting: configuring complex IAM roles, managing bucket-level encryption and ensuring that every external engine stays in sync with the latest table version.

Snowflake Storage for Apache Iceberg™ tables removes this friction. You can now host Iceberg tables directly on Snowflake managed infrastructure. To your admins, it looks and feels like any other data you store in Snowflake; to your external Spark or Trino clusters, it appears as a standard, high-performance Iceberg table. You can finally say yes to every data consumer without inheriting the plumbing nightmare of self-managed storage.

Built-in peace of mind: Data integrity as a service

Openness shouldn't mean fragility. One of the greatest risks of self-managed storage is the lack of a built-in safety net.

The cost of a single mistake

Consider a common scenario: A data engineer is tasked with cleaning up "old" data in a self-managed S3 bucket to save on storage costs. They accidentally misconfigure a cloud lifecycle policy or run a cleanup script that deletes a critical metadata folder or a set of manifest files that are still referenced by the current table version.

In a traditional self-managed Iceberg setup, this mistake is often catastrophic. Without an integrated recovery mechanism, the table becomes inconsistent. Engines will return errors or, worse, return incomplete query results. Recovering that state manually can take hours, if not days, of forensic work — if it is at all possible.

The Snowflake safety net

With Snowflake Storage for Apache Iceberg™ tables, we bring our enterprise-grade resiliency to the Iceberg ecosystem:

  • Fail-safe: We provide a seven-day managed recovery window. If metadata is accidentally corrupted or deleted, Snowflake can help restore metadata to a consistent state within the recovery window — a built-in data resilience mechanism that is absent in self-managed storage.

  • Cross-cloud replication: Business continuity features are built in. You can seamlessly replicate your Iceberg data across regions and clouds, delivering high availability even during provider-level outages.

By managing the storage layer, Snowflake grants your interoperable data the same durability architecture as your most mission-critical internal tables stored in Snowflake.

Optimized interoperability across the entire stack

We believe that storing data is only half the battle; the other half is preparing data to be "ready to work" for every engine that touches it.

A common issue in the lakehouse is the "small file problem" — where frequent writes create thousands of tiny files that degrade query performance across all engines. Traditionally, solving this required manual VACUUM or REORG commands and constant monitoring.

Snowflake Storage addresses this through intelligent table optimization. This feature acts as a background "autopilot" for your storage, automatically handling tasks such as file compaction and clustering.

Additionally, all tables are optimized for best performance on Snowflake. But we didn't stop there. To drive better interoperability across the entire stack, we’ve provided knobs that allow data engineers to tune the storage layout for their specific needs. By adjusting file size settings and partitioning schemes, you can optimize the data Snowflake writes for the specific scan patterns of external engines, such as Spark or Trino.

The result is improved performance across workloads. Snowflake lays the data out optimally respecting your configuration, reducing query latency and improving efficiency across your entire data ecosystem.

All of the interoperability, none of the complexity

Snowflake Storage for Apache Iceberg™ tables is for organizations that want to focus on data strategy, not storage maintenance. By letting Snowflake manage the plumbing, you gain a secure, optimized and resilient data foundation that is open to any engine you choose.

Getting started

Creating an Iceberg table on Snowflake Storage is as simple as creating a standard native table. To create your first Iceberg table using Snowflake-managed storage, simply run:

CREATE ICEBERG TABLE my_iceberg_table_internal (col1 int)
CATALOG = SNOWFLAKE
EXTERNAL_VOLUME = SNOWFLAKE_MANAGED;

Ready to simplify your lakehouse architecture? Download the new ebook, “Building the Interoperable Lakehouse: Data Strategies for AI Leaders,” and check out our documentation to get started with Snowflake Storage for Apache Iceberg™ tables today.

 

Forward-looking statements

This content contains forward-looking statements, including about our future product offerings, and are not commitments to deliver any product offerings. Actual results and offerings may differ and are subject to known and unknown risks and uncertainties. See our latest 10-Q for more information.

Feature

Native Support for Apache Iceberg Tables

Build high-performing, interoperable pipelines that serve the needs of every team, without sacrificing security, scale or simplicity. Whether you’re batch or streaming, developing AI agents or advanced analytics, Snowflake makes Iceberg work without the operational burden.
Share Article

Subscribe to our blog newsletter

Get the best, coolest and latest delivered to your inbox each week

Where Data Does More

  • 30-day free trial
  • No credit card required
  • Cancel anytime