Skip to content

feat(transaction): RowDelta merge-on-read deletion vectors (carries forward #2203)#2678

Open
raghav-reglobe wants to merge 15 commits into
apache:mainfrom
raghav-reglobe:rowdelta-dv-write-on-main
Open

feat(transaction): RowDelta merge-on-read deletion vectors (carries forward #2203)#2678
raghav-reglobe wants to merge 15 commits into
apache:mainfrom
raghav-reglobe:rowdelta-dv-write-on-main

Conversation

@raghav-reglobe

Copy link
Copy Markdown

Summary

Completes RowDelta with the merge-on-read (MoR) path — V3 Puffin deletion vectors committed via a content=Deletes manifest — on top of @wirybeaver's copy-on-write RowDelta foundation from #2203 (rebased onto current main, authorship preserved). Together they give RowDelta both COW (rewrite data files) and MoR (deletion vectors) — the basis for MERGE INTO / UPDATE / DELETE on V3 tables.

#2203 has been stale and conflicting with main; per my note there, this carries it forward. The rebase conflicts were trivial (additive mod / use keep-both) plus one moved API (Snapshot::load_manifest_listTable::manifest_list_reader).

MoR additions (on top of #2203)

  • DeleteVector::{to_puffin_blob, from_puffin_blob, write_to_puffin_file} — serialize/deserialize the deletion-vector-v1 blob (magic 0xD1D33964, 4-byte BE length + portable 64-bit roaring bitmap + 4-byte BE CRC32; snapshot-id / sequence-number = -1; fields = []) and produce the PositionDeletes DataFile.
  • RowDeltaAction::add_delete_files MoR commit — writes a content=Deletes manifest with Operation::Delete (replaces the previous FeatureUnsupported stub).
  • delete_vector made pub (with rustdoc).

Validation

  • Byte-identical to Apache Iceberg-Javato_puffin_blob output equals the Java-produced golden DV fixtures from apache/iceberg core test resources, for 3 vectors: empty, {1, 3, 5, 7, 9}, and small+large spanning two roaring containers (crates/iceberg/testdata/puffin/*-position-index.bin).
  • Cross-implementation parity with apache/iceberg-go (puffin/dv_golden_test.go, TestSerializeDV*).
  • Real data, cross-engine — a deletion vector written by this code was committed to a real V3 table via a REST catalog on S3 and read back correctly by an independent engine (Apache Doris): 10 → 7 rows, exact positions deleted.
  • 14 delete_vector + 7 transaction::row_delta tests pass; cargo fmt + cargo clippy clean.

Credit & follow-ups

@raghav-reglobe raghav-reglobe force-pushed the rowdelta-dv-write-on-main branch from 5486140 to c9f97a6 Compare June 20, 2026 01:53
raghav-reglobe added a commit to raghav-reglobe/iceberg-rust that referenced this pull request Jun 20, 2026
The caching delete-file loader now recognizes a deletion vector (PositionDeletes task whose file_format is Puffin), reads the deletion-vector-v1 blob via PuffinReader, and parses it with DeleteVector::from_puffin_blob into the delete filter — completing the V3 DV read path (the DelVecs -> upsert_delete_vector output side was already present). With this, iceberg-rust correctly excludes rows marked by a V3 deletion vector on read; previously such rows were returned. Read complement to the DV-write in apache#2678.

Part 3 of DV-applied scan. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Signed-off-by: Raghvendra Singh <raghav@cashify.in>
wirybeaver and others added 14 commits June 20, 2026 14:20
…fications

This commit implements the core transaction infrastructure for MERGE INTO,
UPDATE, and DELETE operations in Apache Iceberg-Rust. Based on the official
Iceberg Java implementation (RowDelta API).

**New file: `crates/iceberg/src/transaction/row_delta.rs`**
- RowDeltaAction: Transaction action supporting both data file additions
  and deletions in a single snapshot
- add_data_files(): Add new data files (inserts/rewrites in COW mode)
- remove_data_files(): Mark data files as deleted (COW mode)
- add_delete_files(): Reserved for future Merge-on-Read (MOR) support
- validate_from_snapshot(): Conflict detection for concurrent modifications
- RowDeltaOperation: Implements SnapshotProduceOperation trait
  - Determines operation type (Append/Delete/Overwrite) based on changes
  - Generates DELETED manifest entries for removed files
  - Carries forward existing manifests for unchanged data

**Modified: `crates/iceberg/src/transaction/mod.rs`**
- Add row_delta() method to Transaction API
- Export row_delta module

**Modified: `crates/iceberg/src/transaction/snapshot.rs`**
- Add write_delete_manifest() to write DELETED manifest entries
- Update manifest_file() to process delete entries from SnapshotProduceOperation
- Update validation to allow delete-only operations

Comprehensive unit tests with ~85% coverage:
- test_row_delta_add_only: Pure append operation
- test_row_delta_remove_only: Delete-only operation
- test_row_delta_add_and_remove: COW update (remove old, add new)
- test_row_delta_with_snapshot_properties: Custom snapshot properties
- test_row_delta_validate_from_snapshot: Snapshot validation logic
- test_row_delta_empty_action: Empty operation error handling
- test_row_delta_incompatible_partition_value: Partition validation

All existing tests pass (1135 passed; 0 failed).

Copy-on-Write (COW) Strategy:
- For row-level modifications: read target files, apply changes,
  write new files, mark old files deleted
- For inserts: write new data files
- Merge-on-Read (MOR) with delete files is reserved for future optimization

References:
- Java implementation: org.apache.iceberg.RowDelta, BaseRowDelta
- Based on implementation plan for MERGE INTO support
- Fix test_row_delta_validate_from_snapshot to assert ErrorKind::DataInvalid
  directly rather than matching against error message strings
- Correct operation() doc comment: remove inaccurate "Only adds delete files → Delete"
  bullet; add explicit note that Operation::Delete is deferred until MoR is wired up
- Add comment explaining why removed_data_files are not validated (already-committed
  files, matches Java MergingSnapshotProducer behavior)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
unwrap_err() requires T: Debug on the Ok type (ActionCommit), which is
not derived. Use a match instead to extract and assert the error kind.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…eview

Blocking issues fixed:
- existing_manifest() now rewrites manifests that contain deleted files instead
  of dropping them entirely. Surviving files get EXISTING entries (original
  sequence numbers preserved), removed files get DELETED entries (snapshot_id
  updated to current, sequence numbers preserved). Matches Java
  ManifestFilterManager.filterManifestWithDeletedFiles behavior.
- DELETED manifest entries now carry original sequence numbers (copied from the
  loaded manifest entry), fixing the spec violation where 0 was used as a
  placeholder.
- Snapshot summary now tracks removed files via
  SnapshotSummaryCollector.remove_file(), populating deleted-data-files,
  deleted-records, and removed-files-size metrics.
- add_delete_files() now returns ErrorKind::FeatureUnsupported immediately on
  commit instead of silently dropping the files.

Design improvements:
- Renamed write_delete_manifest → write_manifest_with_deleted_entries to
  distinguish data manifests with DELETED-status entries from Iceberg delete
  manifests (content=Deletes, used for MoR delete files).
- SnapshotProduceOperation::existing_manifest now takes &mut SnapshotProducer
  so implementations can call new_manifest_writer() for rewrites.
- Added removed_data_files() default method to the trait for summary tracking.
- Removed added_delete_files from RowDeltaOperation (only needed for the
  fail-fast check in RowDeltaAction::commit).
- Trimmed struct/method doc comments to match codebase convention.

Tests:
- Added test_row_delta_cow_manifest_rewrite: FastAppend 2 files, RowDelta
  remove one + add one, then verify DELETED/EXISTING/ADDED entries and
  sequence numbers in the resulting manifests.
- Added test_row_delta_add_delete_files_errors for the fail-fast path.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Implements DeleteVector::to_puffin_blob / from_puffin_blob for the Iceberg deletion-vector-v1 Puffin blob format: 4-byte BE length, magic 0xD1D33964, portable 64-bit RoaringTreemap, 4-byte BE CRC32. Adds the crc32fast dependency.

Tests: self serialize<->deserialize round-trip; pyiceberg-portable layout validation (Spark/Iceberg-Java byte compatibility); full Puffin-file round-trip via PuffinWriter/PuffinReader.

Serializer design referenced from risingwavelabs/iceberg-rust apache#113. Toward finishing RowDelta merge-on-read DV-write (PR apache#2203).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Raghvendra Singh <raghav@cashify.in>
…es manifest)

SnapshotProducer now carries added_delete_files and writes a separate content=Deletes manifest (build_v2_deletes / build_v3_deletes). RowDeltaAction.add_delete_files commits MoR delete files (position/equality, incl. V3 deletion vectors) instead of returning FeatureUnsupported; operation() returns Delete (only deletes) or Overwrite (with data files) per Java BaseRowDelta semantics.

Replaces the stub-error test with test_row_delta_add_delete_files_mor (commits a position-delete file; asserts Operation::Delete + a PositionDeletes manifest entry). All 45 transaction tests pass (fast_append / overwrite / CoW unregressed).

Toward finishing RowDelta merge-on-read DV-write (PR apache#2203).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Raghvendra Singh <raghav@cashify.in>
Adds PuffinWriter::close_with_metadata() + PuffinWriteResult (file size + per-blob offsets/lengths). DeleteVector::write_to_puffin_file() writes a deletion-vector-v1 Puffin file and returns the V3 DataFile{content=PositionDeletes, referenced_data_file, content_offset, content_size_in_bytes} ready to feed RowDeltaAction::add_delete_files.

Test test_dv_write_to_puffin_file (DV -> Puffin file -> DataFile, read back). Completes the end-to-end Rust MoR DV-write path: DeleteVector -> Puffin file -> DataFile -> RowDelta content=Deletes manifest commit. cf. RW deletion_vector_writer.rs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Raghvendra Singh <raghav@cashify.in>
…ation example

Exposes 'pub mod delete_vector' (+ rustdoc on the public surface to satisfy #![deny(missing_docs)]) so downstream crates can use DeleteVector + write_to_puffin_file.

Adds crates/catalog/rest/examples/pulse_dv_realdata.rs: connects to a REST catalog (Polaris), loads a real V3 table, writes a deletion vector via write_to_puffin_file, commits via RowDelta (content=Deletes manifest) — for cross-engine (Doris/Spark) validation before opening PR apache#2203. Compiles clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Raghvendra Singh <raghav@cashify.in>
…mple

RestCatalog requires explicit StorageFactory injection (S3 storage moved to the iceberg-storage-opendal crate, apache#2207). Wire OpenDalResolvingStorageFactory into the pulse_dv_realdata example and dev-dep the crate.

Verified end-to-end against a real V3 Iceberg table on S3 (Polaris REST catalog): the Rust harness wrote a deletion-vector-v1 and committed via RowDelta; Doris (independent engine) read COUNT(*) 10 -> 7 with positions 0,1,2 (ids 1,2,3) deleted.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Raghvendra Singh <raghav@cashify.in>
…ayload

Adds testdata/puffin/deletion-vector-v1-payload.bin (Java-produced deletion-vector-v1 payload for positions {1,3,5,7,9}, from apache/iceberg test resources via apache/iceberg-go's fixture) and a test asserting DeleteVector::to_puffin_blob produces byte-identical output. Proves our roaring serialization + length/magic/CRC framing match the Iceberg-Java reference exactly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Raghvendra Singh <raghav@cashify.in>
…rg-go)

Mirrors apache/iceberg-go TestSerializeDVEmpty + TestSerializeDVLargePositions: empty DV round-trips; positions straddling the 2^31 (Java-signed) and 2^32 (roaring bucket) boundaries round-trip. 12 delete_vector tests now pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Raghvendra Singh <raghav@cashify.in>
…mall+large)

From apache/iceberg core test resources: empty-position-index.bin + small-and-large-values-position-index.bin. Now 3 Java byte-identical DV-payload vectors (empty; alternating 1/3/5/7/9; small+large across two 16-bit roaring containers). 14 delete_vector tests pass. all-container-types fixture intentionally NOT byte-pinned: Java run-optimizes containers, roaring-rs does not — spec-valid but different bytes (round-trip still covered).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Raghvendra Singh <raghav@cashify.in>
Apache iceberg-rust CI gates on rustfmt + clippy; both now clean on the changed crates (iceberg, iceberg-catalog-rest).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Raghvendra Singh <raghav@cashify.in>
raghav-reglobe added a commit to raghav-reglobe/iceberg-rust that referenced this pull request Jun 20, 2026
…eleteFile

Groundwork for V3 deletion-vector-applied scan: the delete-file scan task now carries the manifest entry's file format (Puffin marks a DV) and referenced_data_file, so the delete-file loader can distinguish a Puffin DV from a Parquet positional/equality delete and know which data file it applies to. Populated in From<&DeleteFileContext>; test constructors default to Parquet/None.

Part 1 of DV-applied scan (complements the DV-write in apache#2678). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Signed-off-by: Raghvendra Singh <raghav@cashify.in>
raghav-reglobe added a commit to raghav-reglobe/iceberg-rust that referenced this pull request Jun 20, 2026
The caching delete-file loader now recognizes a deletion vector (PositionDeletes task whose file_format is Puffin), reads the deletion-vector-v1 blob via PuffinReader, and parses it with DeleteVector::from_puffin_blob into the delete filter — completing the V3 DV read path (the DelVecs -> upsert_delete_vector output side was already present). With this, iceberg-rust correctly excludes rows marked by a V3 deletion vector on read; previously such rows were returned. Read complement to the DV-write in apache#2678.

Part 3 of DV-applied scan. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Signed-off-by: Raghvendra Singh <raghav@cashify.in>
@raghav-reglobe raghav-reglobe force-pushed the rowdelta-dv-write-on-main branch from c9f97a6 to a8bc543 Compare June 20, 2026 10:58
raghav-reglobe added a commit to raghav-reglobe/iceberg-rust that referenced this pull request Jun 20, 2026
…eleteFile

Groundwork for V3 deletion-vector-applied scan: the delete-file scan task now carries the manifest entry's file format (Puffin marks a DV) and referenced_data_file, so the delete-file loader can distinguish a Puffin DV from a Parquet positional/equality delete and know which data file it applies to. Populated in From<&DeleteFileContext>; test constructors default to Parquet/None.

Part 1 of DV-applied scan (complements the DV-write in apache#2678). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Signed-off-by: Raghvendra Singh <raghav@cashify.in>
raghav-reglobe added a commit to raghav-reglobe/iceberg-rust that referenced this pull request Jun 20, 2026
The caching delete-file loader now recognizes a deletion vector (PositionDeletes task whose file_format is Puffin), reads the deletion-vector-v1 blob via PuffinReader, and parses it with DeleteVector::from_puffin_blob into the delete filter — completing the V3 DV read path (the DelVecs -> upsert_delete_vector output side was already present). With this, iceberg-rust correctly excludes rows marked by a V3 deletion vector on read; previously such rows were returned. Read complement to the DV-write in apache#2678.

Part 3 of DV-applied scan. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Signed-off-by: Raghvendra Singh <raghav@cashify.in>
- adapt pulse_dv_realdata example to manifest_list_reader (load_manifest_list
  was removed from Snapshot in current main)
- allowlist 'mor' (merge-on-read) in .typos.toml
- regenerate crates/iceberg/public-api.txt and DEPENDENCIES.rust.tsv files

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@raghav-reglobe raghav-reglobe force-pushed the rowdelta-dv-write-on-main branch from 0ff7ccd to 207aca3 Compare June 20, 2026 12:00
raghav-reglobe added a commit to raghav-reglobe/iceberg-rust that referenced this pull request Jun 20, 2026
…eleteFile

Groundwork for V3 deletion-vector-applied scan: the delete-file scan task now carries the manifest entry's file format (Puffin marks a DV) and referenced_data_file, so the delete-file loader can distinguish a Puffin DV from a Parquet positional/equality delete and know which data file it applies to. Populated in From<&DeleteFileContext>; test constructors default to Parquet/None.

Part 1 of DV-applied scan (complements the DV-write in apache#2678). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Signed-off-by: Raghvendra Singh <raghav@cashify.in>
raghav-reglobe added a commit to raghav-reglobe/iceberg-rust that referenced this pull request Jun 20, 2026
The caching delete-file loader now recognizes a deletion vector (PositionDeletes task whose file_format is Puffin), reads the deletion-vector-v1 blob via PuffinReader, and parses it with DeleteVector::from_puffin_blob into the delete filter — completing the V3 DV read path (the DelVecs -> upsert_delete_vector output side was already present). With this, iceberg-rust correctly excludes rows marked by a V3 deletion vector on read; previously such rows were returned. Read complement to the DV-write in apache#2678.

Part 3 of DV-applied scan. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Signed-off-by: Raghvendra Singh <raghav@cashify.in>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants