feat(scan): apply V3 deletion vectors on read (stacked on #2678)#2681
Open
raghav-reglobe wants to merge 22 commits into
Open
feat(scan): apply V3 deletion vectors on read (stacked on #2678)#2681raghav-reglobe wants to merge 22 commits into
raghav-reglobe wants to merge 22 commits into
Conversation
…fications This commit implements the core transaction infrastructure for MERGE INTO, UPDATE, and DELETE operations in Apache Iceberg-Rust. Based on the official Iceberg Java implementation (RowDelta API). **New file: `crates/iceberg/src/transaction/row_delta.rs`** - RowDeltaAction: Transaction action supporting both data file additions and deletions in a single snapshot - add_data_files(): Add new data files (inserts/rewrites in COW mode) - remove_data_files(): Mark data files as deleted (COW mode) - add_delete_files(): Reserved for future Merge-on-Read (MOR) support - validate_from_snapshot(): Conflict detection for concurrent modifications - RowDeltaOperation: Implements SnapshotProduceOperation trait - Determines operation type (Append/Delete/Overwrite) based on changes - Generates DELETED manifest entries for removed files - Carries forward existing manifests for unchanged data **Modified: `crates/iceberg/src/transaction/mod.rs`** - Add row_delta() method to Transaction API - Export row_delta module **Modified: `crates/iceberg/src/transaction/snapshot.rs`** - Add write_delete_manifest() to write DELETED manifest entries - Update manifest_file() to process delete entries from SnapshotProduceOperation - Update validation to allow delete-only operations Comprehensive unit tests with ~85% coverage: - test_row_delta_add_only: Pure append operation - test_row_delta_remove_only: Delete-only operation - test_row_delta_add_and_remove: COW update (remove old, add new) - test_row_delta_with_snapshot_properties: Custom snapshot properties - test_row_delta_validate_from_snapshot: Snapshot validation logic - test_row_delta_empty_action: Empty operation error handling - test_row_delta_incompatible_partition_value: Partition validation All existing tests pass (1135 passed; 0 failed). Copy-on-Write (COW) Strategy: - For row-level modifications: read target files, apply changes, write new files, mark old files deleted - For inserts: write new data files - Merge-on-Read (MOR) with delete files is reserved for future optimization References: - Java implementation: org.apache.iceberg.RowDelta, BaseRowDelta - Based on implementation plan for MERGE INTO support
- Fix test_row_delta_validate_from_snapshot to assert ErrorKind::DataInvalid directly rather than matching against error message strings - Correct operation() doc comment: remove inaccurate "Only adds delete files → Delete" bullet; add explicit note that Operation::Delete is deferred until MoR is wired up - Add comment explaining why removed_data_files are not validated (already-committed files, matches Java MergingSnapshotProducer behavior) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
unwrap_err() requires T: Debug on the Ok type (ActionCommit), which is not derived. Use a match instead to extract and assert the error kind. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…eview Blocking issues fixed: - existing_manifest() now rewrites manifests that contain deleted files instead of dropping them entirely. Surviving files get EXISTING entries (original sequence numbers preserved), removed files get DELETED entries (snapshot_id updated to current, sequence numbers preserved). Matches Java ManifestFilterManager.filterManifestWithDeletedFiles behavior. - DELETED manifest entries now carry original sequence numbers (copied from the loaded manifest entry), fixing the spec violation where 0 was used as a placeholder. - Snapshot summary now tracks removed files via SnapshotSummaryCollector.remove_file(), populating deleted-data-files, deleted-records, and removed-files-size metrics. - add_delete_files() now returns ErrorKind::FeatureUnsupported immediately on commit instead of silently dropping the files. Design improvements: - Renamed write_delete_manifest → write_manifest_with_deleted_entries to distinguish data manifests with DELETED-status entries from Iceberg delete manifests (content=Deletes, used for MoR delete files). - SnapshotProduceOperation::existing_manifest now takes &mut SnapshotProducer so implementations can call new_manifest_writer() for rewrites. - Added removed_data_files() default method to the trait for summary tracking. - Removed added_delete_files from RowDeltaOperation (only needed for the fail-fast check in RowDeltaAction::commit). - Trimmed struct/method doc comments to match codebase convention. Tests: - Added test_row_delta_cow_manifest_rewrite: FastAppend 2 files, RowDelta remove one + add one, then verify DELETED/EXISTING/ADDED entries and sequence numbers in the resulting manifests. - Added test_row_delta_add_delete_files_errors for the fail-fast path. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Implements DeleteVector::to_puffin_blob / from_puffin_blob for the Iceberg deletion-vector-v1 Puffin blob format: 4-byte BE length, magic 0xD1D33964, portable 64-bit RoaringTreemap, 4-byte BE CRC32. Adds the crc32fast dependency. Tests: self serialize<->deserialize round-trip; pyiceberg-portable layout validation (Spark/Iceberg-Java byte compatibility); full Puffin-file round-trip via PuffinWriter/PuffinReader. Serializer design referenced from risingwavelabs/iceberg-rust apache#113. Toward finishing RowDelta merge-on-read DV-write (PR apache#2203). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Raghvendra Singh <raghav@cashify.in>
…es manifest) SnapshotProducer now carries added_delete_files and writes a separate content=Deletes manifest (build_v2_deletes / build_v3_deletes). RowDeltaAction.add_delete_files commits MoR delete files (position/equality, incl. V3 deletion vectors) instead of returning FeatureUnsupported; operation() returns Delete (only deletes) or Overwrite (with data files) per Java BaseRowDelta semantics. Replaces the stub-error test with test_row_delta_add_delete_files_mor (commits a position-delete file; asserts Operation::Delete + a PositionDeletes manifest entry). All 45 transaction tests pass (fast_append / overwrite / CoW unregressed). Toward finishing RowDelta merge-on-read DV-write (PR apache#2203). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Raghvendra Singh <raghav@cashify.in>
Adds PuffinWriter::close_with_metadata() + PuffinWriteResult (file size + per-blob offsets/lengths). DeleteVector::write_to_puffin_file() writes a deletion-vector-v1 Puffin file and returns the V3 DataFile{content=PositionDeletes, referenced_data_file, content_offset, content_size_in_bytes} ready to feed RowDeltaAction::add_delete_files.
Test test_dv_write_to_puffin_file (DV -> Puffin file -> DataFile, read back). Completes the end-to-end Rust MoR DV-write path: DeleteVector -> Puffin file -> DataFile -> RowDelta content=Deletes manifest commit. cf. RW deletion_vector_writer.rs.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Raghvendra Singh <raghav@cashify.in>
…ation example Exposes 'pub mod delete_vector' (+ rustdoc on the public surface to satisfy #![deny(missing_docs)]) so downstream crates can use DeleteVector + write_to_puffin_file. Adds crates/catalog/rest/examples/pulse_dv_realdata.rs: connects to a REST catalog (Polaris), loads a real V3 table, writes a deletion vector via write_to_puffin_file, commits via RowDelta (content=Deletes manifest) — for cross-engine (Doris/Spark) validation before opening PR apache#2203. Compiles clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Raghvendra Singh <raghav@cashify.in>
…mple RestCatalog requires explicit StorageFactory injection (S3 storage moved to the iceberg-storage-opendal crate, apache#2207). Wire OpenDalResolvingStorageFactory into the pulse_dv_realdata example and dev-dep the crate. Verified end-to-end against a real V3 Iceberg table on S3 (Polaris REST catalog): the Rust harness wrote a deletion-vector-v1 and committed via RowDelta; Doris (independent engine) read COUNT(*) 10 -> 7 with positions 0,1,2 (ids 1,2,3) deleted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Raghvendra Singh <raghav@cashify.in>
…ayload
Adds testdata/puffin/deletion-vector-v1-payload.bin (Java-produced deletion-vector-v1 payload for positions {1,3,5,7,9}, from apache/iceberg test resources via apache/iceberg-go's fixture) and a test asserting DeleteVector::to_puffin_blob produces byte-identical output. Proves our roaring serialization + length/magic/CRC framing match the Iceberg-Java reference exactly.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Raghvendra Singh <raghav@cashify.in>
…rg-go) Mirrors apache/iceberg-go TestSerializeDVEmpty + TestSerializeDVLargePositions: empty DV round-trips; positions straddling the 2^31 (Java-signed) and 2^32 (roaring bucket) boundaries round-trip. 12 delete_vector tests now pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Raghvendra Singh <raghav@cashify.in>
…mall+large) From apache/iceberg core test resources: empty-position-index.bin + small-and-large-values-position-index.bin. Now 3 Java byte-identical DV-payload vectors (empty; alternating 1/3/5/7/9; small+large across two 16-bit roaring containers). 14 delete_vector tests pass. all-container-types fixture intentionally NOT byte-pinned: Java run-optimizes containers, roaring-rs does not — spec-valid but different bytes (round-trip still covered). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Raghvendra Singh <raghav@cashify.in>
Apache iceberg-rust CI gates on rustfmt + clippy; both now clean on the changed crates (iceberg, iceberg-catalog-rest). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Raghvendra Singh <raghav@cashify.in>
- adapt pulse_dv_realdata example to manifest_list_reader (load_manifest_list was removed from Snapshot in current main) - allowlist 'mor' (merge-on-read) in .typos.toml - regenerate crates/iceberg/public-api.txt and DEPENDENCIES.rust.tsv files Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eleteFile Groundwork for V3 deletion-vector-applied scan: the delete-file scan task now carries the manifest entry's file format (Puffin marks a DV) and referenced_data_file, so the delete-file loader can distinguish a Puffin DV from a Parquet positional/equality delete and know which data file it applies to. Populated in From<&DeleteFileContext>; test constructors default to Parquet/None. Part 1 of DV-applied scan (complements the DV-write in apache#2678). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Raghvendra Singh <raghav@cashify.in>
DeleteFileIndex now recognizes V3 deletion vectors (PositionDeletes content + Puffin format + referenced_data_file) and indexes them by the data file path they apply to, separate from partition-scoped Parquet positional deletes. get_deletes_for_data_file returns the matching DV (sequence-number rule) and, per spec, has it supersede positional-delete files for that data file. Mirrors apache/iceberg-go buildDVIndex + matchDVToData. Part 2 of DV-applied scan. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Raghvendra Singh <raghav@cashify.in>
The caching delete-file loader now recognizes a deletion vector (PositionDeletes task whose file_format is Puffin), reads the deletion-vector-v1 blob via PuffinReader, and parses it with DeleteVector::from_puffin_blob into the delete filter — completing the V3 DV read path (the DelVecs -> upsert_delete_vector output side was already present). With this, iceberg-rust correctly excludes rows marked by a V3 deletion vector on read; previously such rows were returned. Read complement to the DV-write in apache#2678. Part 3 of DV-applied scan. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Raghvendra Singh <raghav@cashify.in>
Adds a round-trip test (ported from apache/iceberg-go TestFilterByDeletionVector / TestReadAllDeletionVectors): write a deletion vector marking positions {1,3} to a Puffin file, load it through the caching delete-file loader, and assert those positions surface as the data file's delete vector. Also refreshes the load_deletes doc comment now that deletion-vector loading is implemented (was 'Not yet Implemented').
Part 4 of DV-applied scan. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Raghvendra Singh <raghav@cashify.in>
…size The deletion vector read path located the DV blob by parsing a Puffin footer (PuffinReader::file_metadata). That only works when the DV is wrapped in a full Puffin container — but the Iceberg spec addresses a DV blob by the manifest entry's content_offset + content_size_in_bytes, and writers such as DuckDB store the deletion-vector-v1 blob standalone (no PFA1 framing). Reading those via the footer fails with 'Bad magic value ... should be PFA1'. Now FileScanTaskDeleteFile carries content_offset + content_size_in_bytes, and the loader reads exactly those bytes and parses them with the new DeleteVector::from_serialized_bytes (factored out of from_puffin_blob). This works whether the DV is standalone or inside a Puffin container. Validated cross-engine on real data: iceberg-rust scans tables with DuckDB-written DVs and returns the correct post-delete row counts (900K/1M, 9M/10M). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Raghvendra Singh <raghav@cashify.in>
The Iceberg spec requires a deletion-vector-v1 blob to be stored uncompressed ('Omit compression-codec; deletion-vector-v1 is not compressed'), so reading the bytes at content_offset/content_size and parsing them directly is correct for all compliant deletion vectors (the fast path). A non-conforming writer could, however, store a compressed deletion vector blob; in that case the bytes at content_offset are compressed and the direct parse fails. Fall back to reading the blob through the Puffin footer, which records the per-blob compression-codec and decompresses accordingly (matching the blob by content_offset). Container-less blob files (no footer) stay on the fast path.
The fast path is covered by the existing round-trip test and real-data validation; the compressed fallback is defensive (the spec forbids compressed DVs, so no fixture is produced by conforming writers).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Raghvendra Singh <raghav@cashify.in>
FileScanTaskDeleteFile gains file_format / referenced_data_file / content_offset / content_size_in_bytes (+ their with_ builder setters) and DeleteVector::from_serialized_bytes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Make iceberg-rust apply V3 deletion vectors when scanning. Today the scan
applies V2 Parquet positional/equality deletes but ignores V3 deletion vectors,
so rows marked deleted by a DV are incorrectly returned. This is the read
complement to the DV-write work in #2678.
End to end:
DeleteFileIndexrecognizes a DV (aPositionDeletesentry stored as Puffinwith a
referenced_data_file) and indexes it by the data file it applies to,superseding positional-delete files for that data file (per spec). Mirrors
apache/iceberg-go
buildDVIndex/matchDVToData.deletion-vector-v1blob located bythe manifest entry's
content_offset/content_size_in_bytes— the spec's"direct access" path — and parses it with
DeleteVector::from_serialized_bytes.standalone blob (e.g. by DuckDB), and falls back to the Puffin footer to honor a
compression codec if a (non-conforming) writer ever compresses a DV blob.
Stacked on #2678
This branch contains #2678's commits (the DV-write side) plus this PR's read
commits, both rebased on current
main. Please review #2678 first; this PR'sown changes are the commits from
feat(scan): carry file_format + referenced_data_fileonward (6 commits touching
delete_file_index.rs,arrow/caching_delete_file_loader.rs,scan/task.rs,delete_vector.rs). It can be reduced to just those once #2678 lands.Validation
test_load_deletes_applies_v3_deletion_vector), ported fromiceberg-go
TestFilterByDeletionVector/TestReadAllDeletionVectors.by DuckDB and returns the correct post-delete row counts (900K/1M and 9M/10M),
proving it reads a foreign-written spec DV — not just its own.
cargo test -p iceberg --lib: 1375 passed, 0 failed; clippy clean.