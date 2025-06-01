Having a large number of users, contributors and vendors supporting Iceberg means that the suggested features and proposed improvements will offer diverse perspectives and the resulting implementations will be more robust — which brings us to the v3 table spec.

An overview of the v3 table spec

The v3 table spec is a major milestone for the technology, bringing a number of incredible new features and unlocking countless use cases for users.

Default values

What it does: With default values, Iceberg users have the ability to handle nulls and missing values in their v3 tables.

How it works: Default values are possible with the addition of two new table configurations. By setting write-default , users can control how their writers handle missing values from fields; for flexibility, this can be changed at any time. On the other hand, initial-default , which is set once for a table, gives users a mechanism to replace existing nulls with a specified value.

Who made it possible:

Shenoda Guirguis, original spec proposal

Limian (Raymond) Zhang, finalized spec

Implementation Ryan Blue, Iceberg PMC Chair Walaa Eldin Moustafa



Deletion vectors

What it does: Deletion vectors are the new default mechanism for handling position deletes in Iceberg. Users no longer have to make trade-offs typically associated with configuring position deletes, e.g. choosing between reducing the number of small files (by enabling partition-level granularity) and more efficient reads (by enabling file-level granularity).

How it works: Once implemented, deletion vectors will take the place of position deletes. The design involves multiple deletion vectors being stored as roaring bitmaps in Puffin files, a performant file type already used across the Iceberg project, where they can be accessed efficiently via an index. Interestingly, “v2 Iceberg did have a notion of [deletion vectors], but those were used in-memory,” offers Anton Okolnychyi, Iceberg Project Management Committee (PMC) Member and Senior Staff Software Engineer at Databricks. “On disk you had Parquet files, in-memory had bitmaps. And once we got to designing v3, we wanted to see what could be done differently to avoid the overhead of the conversion.”

The community’s decision to use Puffin files over the existing Parquet implementation offers performance gains for users and may potentially be better for low-latency use cases. Ultimately, deletion vectors give users the best of both worlds: Position deletes apply at a file-level granularity for more efficient reads, but they are physically stored in consolidated Puffin files to reduce the number of small files.

Who made it possible:

Spec changes Renjie Liu, Iceberg PMC Member, original proposal Anton Okolnychyi, finalized changes

Implementation Amogh Jahagirdar, Iceberg PMC Member Eduard Tudenhoefner, Iceberg PMC Member



Geospatial data types

What it does: Iceberg now supports two new geospatial types, geometry and geography, better aligning with other projects and giving users the ability to unlock better functionality around mapping and location data.

According to Jia Yu, Apache Sedona PMC Chair and Co-Founder of Wherobots, the final functionality is a result of a ton of community research. They reviewed a number of projects and technologies with geospatial support, such as “Sedona, Databricks, Snowflake, BigQuery, pandas” and more, which “all have a different definitions of geospatial data… different types… the behavior of those types are really different.”

How it works: Beyond simply making geospatial types accessible within Iceberg, the spec change also addresses complex issues such as how to handle partitioning and filtering of geospatial fields as well as what column-level metrics should look like for these types. Predicate pushdown and regular column-level metrics are still available for the geospatial types with bounding boxes described by geospatial points serving as maximums and minimums.

Who made it possible:

Spec changes Szehon Ho, Iceberg PMC Member Gang Wu

Kristin Cowalcijk, implementation

A special mention to the entire Wherobots team, which implemented geospatial support on its own fork of Iceberg before offering its expertise to the Iceberg community, providing leadership and implementing the feature for the Iceberg project.

Multi-argument transforms

What it does: Multi-argument transforms give users the ability to conduct transformations over multiple fields for the purposes of partitioning and sorting in Iceberg. Prior to the v3 table spec, only a single field could be transformed for these purposes.

Who made it possible:

叶先进, spec changes

Implementation Fokko Driesprong, Iceberg PMC Member JB Onofré, ASF Board Member



Row lineage

What it does: Row lineage makes it easier for users to trace how rows in an Iceberg table have changed over time, unlocking a number of use cases including improved change data capture (CDC) workflows, easier auditing and better materialized view maintenance. Ultimately, the addition of row lineage to Iceberg “means that Iceberg users will be able to accurately determine the history of any row in their tables,” says Russell Spitzer, Iceberg PMC Member and Principal Software Engineer at Snowflake. “Previously, we could only guess based on user-defined identity columns, but now it's built into the format itself!”

How it works: Every row in an Iceberg table includes two additional fields, _row_id and _last_updated_sequence_number . The Iceberg community was able to implement this in such a way that not every row has to explicitly store values in these fields. Instead, to save space, the column values are implied until materialized through a read query and only then are the values propagated through the metadata layer (Metadata.json → Snapshot → Manifest → Datafile → Row).

Who made it possible:

Spec changes Russell Spitzer Nileema Shingte Attila-Péter Tóth

Implementation Russell Spitzer, core Ryan Blue, core Amogh Jahagirdar, Spark



Table encryption

What it does: The latest update around table encryption unlocks client-side encryption of Iceberg tables, giving users the ability to encrypt all of their data and metadata. Entire tables can be encrypted with a single key, or access can be controlled at the snapshot level.

How it works: To make client-side table encryption possible in Iceberg, users have the ability to associate individual table snapshots with encryption keys stored in a third-party key store. To begin accessing data within a specific snapshot, clients need to have access to that key store and the encryption key in order to decrypt and access the snapshot’s manifest list. From there, manifest lists have a similar mechanism for clients to decrypt the manifest files, and, finally, manifest files have a data file encryption key for clients to access to the data files.

Who made it possible:

Spec changes and implementation Gidon Gershinsky Russell Spitzer Ryan Blue



Variant data type

What it does: Variant types allow users to handle less regular, semi-structured data sets where certain fields are intermittently used. Take, for example, sensor data: All sensors may report a location and timestamp, but some sensors report temperature, others report humidity and so on. As Snowflake’s Senior Software Engineer Aihua Xu, one of the contributors to the variant type, puts it: “Adding [variant] to the Iceberg v3 spec was about meeting the realities of today’s data. Native [variant] support enables Iceberg to efficiently represent and process this kind of data, unlocking performance and flexibility without compromising on structure.”

How it works: The variant data type enables users to store variable types and fields in Apache Iceberg™ tables where the field names and their types are extracted into metadata and value fields and then stored as binary. Reading the data involves deserialization. To be more efficient, an additional feature called “shredding” allows for the consistent fields, such as the location and timestamp in the previous example, and their types to be extracted and stored in Parquet; the remaining fields are stored in metadata and value fields as described above.

Who made it possible: