feat: clean up data files on failed writes#6320
Draft
wjones127 wants to merge 3 commits intolance-format:mainfrom
Draft
feat: clean up data files on failed writes#6320wjones127 wants to merge 3 commits intolance-format:mainfrom
wjones127 wants to merge 3 commits intolance-format:mainfrom
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Previously, if a write operation (insert, update, merge_insert) failed mid-stream, the data files already written to storage were left as orphans, requiring GC to reclaim them after 7+ days. This PR adds best-effort cleanup at two levels: 1. `do_write_fragments`: cleans up all data files it opened (both completed and in-progress) if the write loop returns an error. External-base files are skipped with a warning since their object store is not available at cleanup time. 2. `update` / `merge_insert` execute_impl: cleans up newly written data files if the subsequent `apply_deletions` step fails. Fixes lance-format#6124 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move fragment.files.push(data_file) before params.progress.complete() so that if the progress callback fails, the completed file is still tracked for cleanup. Extract the duplicated in-progress file cleanup into cleanup_partial_write(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rror The most common production failure (commit conflict) would leave all compacted data files as orphans. This adds cleanup in two places: 1. commit_compaction(): if apply_commit fails (e.g. commit conflict), delete all data files from the completed compaction tasks. 2. rewrite_files(): if post-write steps fail (rechunking stable row ids, recalculating versions), delete the newly written fragments' data files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
09814c8 to
545db8f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Previously, if a write operation failed mid-stream, the data files already
written to storage were left as orphans until GC reclaimed them (7+ days by
default).
This adds best-effort cleanup at two levels:
do_write_fragments: on error in the write loop, deletes all data filesit opened (both completed and in-progress). External-base files are
skipped with a warning since their object store isn't available at
cleanup time.
update/merge_insertexecute_impl: ifapply_deletionsfailsafter data files were successfully written, cleans up those data files.
Every path we clean up was generated by a writer we opened in the same call,
stored in local-only data structures, so there is no risk of deleting
pre-existing files.
Fixes #6124
Test plan
dataset::writetests pass🤖 Generated with Claude Code