A few years ago, the conversation around Apache Iceberg™ was about whether open table formats could replace proprietary data warehouses. That conversation is over.
At Iceberg Summit 2026, over 600 attendees gathered in San Francisco for two days and over 70 sessions, and not a single talk tried to convince anyone to adopt Iceberg. Every session started from the assumption that you're already running it, and asked the harder question: what are we going to do with it next?
The community's answer was clear: Push beyond the boundaries that success itself created. Iceberg was designed for large, slow-moving analytical tables, but the workloads now running on it are anything but. Streaming pipelines committing every few seconds, ML feature tables with thousands of columns, disaster recovery scenarios demanding table portability. The community isn't debating whether to fix them. It's building V4.
We spent two days participating in the conference alongside the community, and came away with a clear picture of where the project's energy is concentrated. Two forces defined the event: an ambitious V4 specification that addresses the limitations head-on, and an ecosystem that has exploded in breadth to meet practitioners where they already work.
V4: Solving the problems that success created
The V4 proposals at this year's summit weren't academic exercises. They're direct responses to operational pain that practitioners are experiencing at scale, and collectively, they signal that Iceberg is actively evolving to support AI and streaming workloads as first-class citizens. The community's enthusiasm confirmed the demand.
- Rethinking metadata from the ground up: Iceberg's metadata tree was built for batch workloads, and its write amplification creates commit latencies that streaming can't tolerate. V4's adaptive metadata trees introduce one-file commits which enable low-latency writes without sacrificing read performance on large tables.
- Making tables relocatable: Absolute paths solved real consistency problems early on, but they've become operational friction for replication, disaster recovery and cloud migration. V4's shift to relative paths makes tables inherently portable, eliminating entire categories of expensive metadata rewrites.
- First-class support for wide tables: ML feature engineering produces tables with thousands of columns, and today's layout forces full file rewrites for even small updates. Column families allow column groups to be stored and evolved independently and for new features to be backfilled without touching the rest of the table.
- Extensible column statistics: The current statistics model is being rebuilt for flexibility, opening the door to new index types, more efficient query planning and use cases like ANN search that simply don't fit the existing structure.
What makes this V4 cycle remarkable is who's building it. Engineers from Google, Apple, Snowflake, Databricks, Microsoft, Netflix and LinkedIn are in the same design discussions and reviewing the same PRs. The spec is being shaped by the people who operate Iceberg at its most demanding — and that's why the community trusts it.
The ecosystem has reached critical mass
A spec is only as valuable as the tools that implement it, and Iceberg's ecosystem has crossed a threshold from a handful of engines adding read support to a fully realized stack spanning catalogs, languages, ingestion frameworks and operational tooling.
- The REST catalog has become the integration point: What started as a convenience layer has evolved into the connective tissue of the open lakehouse. JVM-based or not, any engine can interact with Iceberg tables through a common interface. Apache Polaris™ is maturing as an open source implementation, with a growing number of production deployments. The catalog is becoming the control plane for governance, security and multi-tenant access.
- Iceberg is no longer a JVM-only project: The Rust implementation powers DataFusion-Comet's native scan operator, bypassing Apache Spark™'s JVM overhead entirely. A C++ implementation is emerging for engines that need predictable memory and SIMD-optimized execution. PyIceberg has crossed 500,000 daily downloads on PyPI, with teams running it in production without spinning up Spark. These are production-grade implementations that broaden who can build on Iceberg and where it can run.
- Multiengine access is becoming routine: Spark handling ingestion while Snowflake, Trino, DuckDB or Apache Flink® serve queries was described as established architecture rather than aspirational design. The interoperability promise Iceberg made years ago is operational reality at organizations running it across cloud boundaries.
The net effect is that adopting Iceberg no longer requires a single monolithic technology choice. You pick the catalog that fits your governance model, the engine that fits your latency requirements and the language that fits your team, and the spec ensures they all compose.
Watch the sessions
Iceberg Summit exists because this community builds in the open — and that includes sharing the work. Whether you're evaluating V4 proposals for your roadmap, exploring how the REST catalog fits your architecture, or just looking for hard-won operational advice from teams running Iceberg at scale, the full library is accessible here.
A few places to start:
- "From Batch to Streaming and AI, Iceberg for Everyone by Everyone": See how Iceberg has evolved as a project and where V4 will take us.
- "Column Storage for the AI Era": Catch Julien Le Dem on a complementary session on evolving Apache Parquet™ and Apache Arrow™ for what's next.
- "Breaking the Mold: Re-thinking Iceberg Metadata Structure in V4": The deepest dive on adaptive metadata.
- "Reaching Warehouse-Class Performance": Learn about Apple's framework for Spark and Iceberg optimization.
- "Maintaining Iceberg at Scale": See Slack's lessons on centralized table maintenance.
- "Innovations in Apache Iceberg and the Data Ecosystem": A closing panel from community members on where the project is heading.
See you next year!





