The Apache Iceberg v3 Table Spec: Celebrating the Open Source Community’s Shared Success

The Apache Iceberg™ project exemplifies the spirit of open source and shows what’s possible when a community comes together with a common goal: to drive a technology forward. With a mission to bring reliability, performance and openness to large-scale analytics, the Iceberg project continues to evolve and offer many benefits thanks to the diverse voices and efforts of its contributors.
Iceberg’s most recent milestone, the ratification of the v3 table spec, is more than just a technical update. It’s the result of thoughtful design, rigorous discussion and collaboration across dozens of organizations and hundreds of individuals. The v3 table spec reflects a shared investment in the future of open data architectures and a commitment to keeping Iceberg truly vendor-neutral, flexible and community-driven.
This post will highlight the main features of the v3 table spec and shine a light on the collective work that brought this release to life.
Community-driven development
Over the past two years, Iceberg has emerged as a leading standard for open table formats, enabling both users and vendors to agree on a structure for their data and, thus, benefit from the interoperability that it unlocks. It is only with the contributions of the entire community that the Iceberg project can truly thrive and offer all of the benefits — openness, vendor neutrality and interoperability — that make it valuable.
With open source, anyone can suggest a new feature, build it themselves and work with other contributors to bring it into the project. The features that were incorporated into the latest Iceberg table spec are a result of numerous discussions with vendors and individuals alike, who described what they needed from Iceberg to continue using or adopting the technology. For instance, vendors with their own proprietary table formats could suggest adding certain features to Iceberg for consistency, but these would be incorporated only if the entire Iceberg community agreed that those features would be beneficial to the project. That’s open source.
Snowflake is proud to contribute to Apache Iceberg and play a part in shaping the v3 table spec. Our collaboration with the Iceberg community — and our commitment to natively supporting v3 in Snowflake — reflects a core belief: When vendors work together in the open, everyone benefits. It’s this shared investment that truly empowers organizations to unlock the full potential of their data and AI. We're excited to bring v3 support to our users and to continue partnering with the Iceberg community as we look ahead to v4 and the broader future of the open lakehouse.
Chris Child
Having a large number of users, contributors and vendors supporting Iceberg means that the suggested features and proposed improvements will offer diverse perspectives and the resulting implementations will be more robust — which brings us to the v3 table spec.
An overview of the v3 table spec
The v3 table spec is a major milestone for the technology, bringing a number of incredible new features and unlocking countless use cases for users.
Default values
What it does: With default values, Iceberg users have the ability to handle nulls and missing values in their v3 tables.
How it works: Default values are possible with the addition of two new table configurations. By setting write-default
, users can control how their writers handle missing values from fields; for flexibility, this can be changed at any time. On the other hand, initial-default
, which is set once for a table, gives users a mechanism to replace existing nulls with a specified value.
Who made it possible:
Shenoda Guirguis, original spec proposal
Limian (Raymond) Zhang, finalized spec
Implementation
Ryan Blue, Iceberg PMC Chair
Walaa Eldin Moustafa
Deletion vectors
What it does: Deletion vectors are the new default mechanism for handling position deletes in Iceberg. Users no longer have to make trade-offs typically associated with configuring position deletes, e.g. choosing between reducing the number of small files (by enabling partition-level granularity) and more efficient reads (by enabling file-level granularity).
How it works: Once implemented, deletion vectors will take the place of position deletes. The design involves multiple deletion vectors being stored as roaring bitmaps in Puffin files, a performant file type already used across the Iceberg project, where they can be accessed efficiently via an index. Interestingly, “v2 Iceberg did have a notion of [deletion vectors], but those were used in-memory,” offers Anton Okolnychyi, Iceberg Project Management Committee (PMC) Member and Senior Staff Software Engineer at Databricks. “On disk you had Parquet files, in-memory had bitmaps. And once we got to designing v3, we wanted to see what could be done differently to avoid the overhead of the conversion.”
The community’s decision to use Puffin files over the existing Parquet implementation offers performance gains for users and may potentially be better for low-latency use cases. Ultimately, deletion vectors give users the best of both worlds: Position deletes apply at a file-level granularity for more efficient reads, but they are physically stored in consolidated Puffin files to reduce the number of small files.
Who made it possible:
Spec changes
Renjie Liu, Iceberg PMC Member, original proposal
Anton Okolnychyi, finalized changes
Implementation
Amogh Jahagirdar, Iceberg PMC Member
Eduard Tudenhoefner, Iceberg PMC Member
Geospatial data types
What it does: Iceberg now supports two new geospatial types, geometry and geography, better aligning with other projects and giving users the ability to unlock better functionality around mapping and location data.
According to Jia Yu, Apache Sedona PMC Chair and Co-Founder of Wherobots, the final functionality is a result of a ton of community research. They reviewed a number of projects and technologies with geospatial support, such as “Sedona, Databricks, Snowflake, BigQuery, pandas” and more, which “all have a different definitions of geospatial data… different types… the behavior of those types are really different.”
How it works: Beyond simply making geospatial types accessible within Iceberg, the spec change also addresses complex issues such as how to handle partitioning and filtering of geospatial fields as well as what column-level metrics should look like for these types. Predicate pushdown and regular column-level metrics are still available for the geospatial types with bounding boxes described by geospatial points serving as maximums and minimums.
Who made it possible:
Spec changes
Szehon Ho, Iceberg PMC Member
Gang Wu
Kristin Cowalcijk, implementation
A special mention to the entire Wherobots team, which implemented geospatial support on its own fork of Iceberg before offering its expertise to the Iceberg community, providing leadership and implementing the feature for the Iceberg project.
Multi-argument transforms
What it does: Multi-argument transforms give users the ability to conduct transformations over multiple fields for the purposes of partitioning and sorting in Iceberg. Prior to the v3 table spec, only a single field could be transformed for these purposes.
Who made it possible:
叶先进, spec changes
Implementation
Fokko Driesprong, Iceberg PMC Member
JB Onofré, ASF Board Member
Row lineage
What it does: Row lineage makes it easier for users to trace how rows in an Iceberg table have changed over time, unlocking a number of use cases including improved change data capture (CDC) workflows, easier auditing and better materialized view maintenance. Ultimately, the addition of row lineage to Iceberg “means that Iceberg users will be able to accurately determine the history of any row in their tables,” says Russell Spitzer, Iceberg PMC Member and Principal Software Engineer at Snowflake. “Previously, we could only guess based on user-defined identity columns, but now it's built into the format itself!”
How it works: Every row in an Iceberg table includes two additional fields, _row_id
and _last_updated_sequence_number
. The Iceberg community was able to implement this in such a way that not every row has to explicitly store values in these fields. Instead, to save space, the column values are implied until materialized through a read query and only then are the values propagated through the metadata layer (Metadata.json → Snapshot → Manifest → Datafile → Row).
Who made it possible:
Spec changes
Russell Spitzer
Nileema Shingte
Attila-Péter Tóth
Implementation
Russell Spitzer, core
Ryan Blue, core
Amogh Jahagirdar, Spark
Table encryption
What it does: The latest update around table encryption unlocks client-side encryption of Iceberg tables, giving users the ability to encrypt all of their data and metadata. Entire tables can be encrypted with a single key, or access can be controlled at the snapshot level.
How it works: To make client-side table encryption possible in Iceberg, users have the ability to associate individual table snapshots with encryption keys stored in a third-party key store. To begin accessing data within a specific snapshot, clients need to have access to that key store and the encryption key in order to decrypt and access the snapshot’s manifest list. From there, manifest lists have a similar mechanism for clients to decrypt the manifest files, and, finally, manifest files have a data file encryption key for clients to access to the data files.
Who made it possible:
Spec changes and implementation
Gidon Gershinsky
Russell Spitzer
Ryan Blue
Variant data type
What it does: Variant types allow users to handle less regular, semi-structured data sets where certain fields are intermittently used. Take, for example, sensor data: All sensors may report a location and timestamp, but some sensors report temperature, others report humidity and so on. As Snowflake’s Senior Software Engineer Aihua Xu, one of the contributors to the variant type, puts it: “Adding [variant] to the Iceberg v3 spec was about meeting the realities of today’s data. Native [variant] support enables Iceberg to efficiently represent and process this kind of data, unlocking performance and flexibility without compromising on structure.”
How it works: The variant data type enables users to store variable types and fields in Apache Iceberg™ tables where the field names and their types are extracted into metadata and value fields and then stored as binary. Reading the data involves deserialization. To be more efficient, an additional feature called “shredding” allows for the consistent fields, such as the location and timestamp in the previous example, and their types to be extracted and stored in Parquet; the remaining fields are stored in metadata and value fields as described above.
Who made it possible:
Spec changes
Tyler Akidau, original proposal
Aihua Xu, finalized changes
Implementation
Aihua Xu, core implementation
Ryan Blue, core
David Cashman, shredding
Gene Peng, encoding and influential work on variant data type in Apache Spark
Neil Chao, Apache Arrow
Open standards are the foundation of an interoperable lakehouse ecosystem, built through collaborative community effort for the benefit of all. The ratification of the Apache Iceberg v3 specification introduces powerful new data types and enhanced schema evolution, marking a significant leap toward a more efficient, diverse and connected data future. At Dremio, we are proud to contribute to this milestone and to champion the adoption of Apache Iceberg, delivering what matters most to our customers: high performance and direct access to their data at any scale.
James Rowland-Jones
Where we are and what’s to come
None of the aforementioned features and functionality would have been possible without the hard work and continued collaboration across the entire Iceberg community. The work that everyone has done for this latest spec is a true testament to the open source spirit, illustrating how individuals, vendors and users can come together to push an entire technology forward.
With that, we’d like to extend a heartfelt thank you to those involved in contributing to Iceberg and making the v3 table spec possible — both those mentioned above and everyone else.
As it currently stands, the diversity of the Iceberg project, across individuals and companies, speaks volumes to the health of the community and the ecosystem. Its continued success will rely on broad adoption of the v3 table spec and widespread integration with existing and new technologies. Thankfully, that is happening — with more technologies, companies and vendors supporting Iceberg every day.
Iceberg v3 marks a significant milestone for the Iceberg community and the open data ecosystem. At Microsoft, we believe that open standards and strong community collaboration are essential to building unified analytics and governance across the entire data estate. We’re excited to integrate v3 into Microsoft OneLake and continue partnering with the community on v4 and future specifications. These efforts are foundational to our vision for open, AI-ready data infrastructure, and lie at the heart of the OneLake design in Microsoft Fabric.
Dipti Borkar
For those who are eager to get started with the latest additions, know that the implementations are currently in progress, with most of the changes expected to be released as part of version 1.10, which is coming soon.
In the meantime, the Apache Iceberg dev mailing list is the best place to stay up to date on the latest advancements and discussions around the Iceberg project. To hear more on developments across the community and industry use cases, check out the breakout session recordings from this year’s Iceberg Summit.