Fix repartition from dropping data when spilling#20672
Open
xanderbailey wants to merge 1 commit intoapache:mainfrom
Open
Fix repartition from dropping data when spilling#20672xanderbailey wants to merge 1 commit intoapache:mainfrom
xanderbailey wants to merge 1 commit intoapache:mainfrom
Conversation
xanderbailey
commented
Mar 3, 2026
| .await | ||
| .expect("Reader timed out — should not hang"); | ||
|
|
||
| assert!( |
Contributor
Author
There was a problem hiding this comment.
Without this fix we fail here.
4925c63 to
09a8630
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
In non-preserve-order repartitioning mode, all input partition tasks share clones of the same
SpillPoolWriterfor each output partition.SpillPoolWriterused#[derive(Clone)]but itsDropimplementation unconditionally setwriter_dropped = trueand finalized the current spill file. This meant that when the first input task finished and its clone was dropped, theSpillPoolReaderwould seewriter_dropped = trueon an empty queue and return EOF — silently discarding every batch subsequently written by the still-running input tasks.This bug requires three conditions to trigger:
What changes are included in this PR?
datafusion/physical-plan/src/spill/spill_pool.rs:active_writer_count: usizetoSpillPoolSharedto track the number of live writer clones.#[derive(Clone)]onSpillPoolWriterwith a manualCloneimpl that incrementsactive_writer_countunder the shared lock.Dropto decrementactive_writer_countand only finalize the current file / setwriter_dropped = truewhen the count reaches zero (i.e. the last clone is dropped). Non-last clones now return immediately fromDrop.test_clone_drop_does_not_signal_eof_prematurelythat reproduces the exact failure: writer1 writes and drops, the reader drains the queue, then writer2 (still alive) writes. Without the fix the reader returns premature EOF and the assertion fails; with the fix the reader waits and reads both batches.Are these changes tested?
Yes. A new unit test (
test_clone_drop_does_not_signal_eof_prematurely) directly reproduces the bug. It was verified to fail without the fix and pass with the fix.Are there any user-facing changes?
No.