Skip to content

feat: clean up data files on failed writes#6320

Draft
wjones127 wants to merge 3 commits intolance-format:mainfrom
wjones127:feat/cleanup-data-files-on-failed-write
Draft

feat: clean up data files on failed writes#6320
wjones127 wants to merge 3 commits intolance-format:mainfrom
wjones127:feat/cleanup-data-files-on-failed-write

Conversation

@wjones127
Copy link
Copy Markdown
Contributor

Previously, if a write operation failed mid-stream, the data files already
written to storage were left as orphans until GC reclaimed them (7+ days by
default).

This adds best-effort cleanup at two levels:

  1. do_write_fragments: on error in the write loop, deletes all data files
    it opened (both completed and in-progress). External-base files are
    skipped with a warning since their object store isn't available at
    cleanup time.

  2. update / merge_insert execute_impl: if apply_deletions fails
    after data files were successfully written, cleans up those data files.

Every path we clean up was generated by a writer we opened in the same call,
stored in local-only data structures, so there is no risk of deleting
pre-existing files.

Fixes #6124

Test plan

  • New test: single-file write failure cleans up partial data file
  • New test: multi-file write failure (file boundary crossed before error) cleans up all files
  • All 183 existing dataset::write tests pass

🤖 Generated with Claude Code

@github-actions github-actions bot added the enhancement New feature or request label Mar 27, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 27, 2026

Codecov Report

❌ Patch coverage is 84.13284% with 43 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/write.rs 91.07% 9 Missing and 10 partials ⚠️
rust/lance/src/dataset/optimize.rs 77.77% 4 Missing and 4 partials ⚠️
rust/lance/src/dataset/write/merge_insert.rs 27.27% 8 Missing ⚠️
rust/lance/src/dataset/write/update.rs 27.27% 8 Missing ⚠️

📢 Thoughts on this report? Let us know!

wjones127 and others added 3 commits March 31, 2026 10:18
Previously, if a write operation (insert, update, merge_insert) failed
mid-stream, the data files already written to storage were left as
orphans, requiring GC to reclaim them after 7+ days.

This PR adds best-effort cleanup at two levels:

1. `do_write_fragments`: cleans up all data files it opened (both
   completed and in-progress) if the write loop returns an error.
   External-base files are skipped with a warning since their object
   store is not available at cleanup time.

2. `update` / `merge_insert` execute_impl: cleans up newly written
   data files if the subsequent `apply_deletions` step fails.

Fixes lance-format#6124

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move fragment.files.push(data_file) before params.progress.complete()
so that if the progress callback fails, the completed file is still
tracked for cleanup.

Extract the duplicated in-progress file cleanup into
cleanup_partial_write().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rror

The most common production failure (commit conflict) would leave all
compacted data files as orphans. This adds cleanup in two places:

1. commit_compaction(): if apply_commit fails (e.g. commit conflict),
   delete all data files from the completed compaction tasks.

2. rewrite_files(): if post-write steps fail (rechunking stable row ids,
   recalculating versions), delete the newly written fragments' data files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@wjones127 wjones127 force-pushed the feat/cleanup-data-files-on-failed-write branch from 09814c8 to 545db8f Compare March 31, 2026 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Automatic cleanup of data files on failed writes

1 participant