Skip to content

Fix Arrow Spill Underrun (#20159)#20684

Open
hareshkh wants to merge 1 commit intoapache:branch-52from
hareshkh:repartition_spill_buffering
Open

Fix Arrow Spill Underrun (#20159)#20684
hareshkh wants to merge 1 commit intoapache:branch-52from
hareshkh:repartition_spill_buffering

Conversation

@hareshkh
Copy link
Contributor

@hareshkh hareshkh commented Mar 3, 2026

Which issue does this PR close?

Rationale for this change

This adjusts the way that the spill channel works. Currently we have a spill writer & reader pairing which uses a mutex to coordindate when a file is ready to be read.

What happens is, that because we were using a spawn_buffered call, the read task would race ahead trying to read a file which is yet to be written out completely.

Alongside this, we need to flush each write to the file, as there is a chance that another thread may see stale data.

What changes are included in this PR?

Adds a flush on write, and converts the read task to not buffer reads.

Are these changes tested?

I haven't written a test, but I have been running the example in the attached issue. While it now fails with allocation errors, the original error goes away.

Are there any user-facing changes?

Nope

Which issue does this PR close?

  • Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

## Which issue does this PR close?

- Closes apache#19425

## Rationale for this change

This adjusts the way that the spill channel works. Currently we have a
spill writer & reader pairing which uses a mutex to coordindate when a
file is ready to be read.

What happens is, that because we were using a `spawn_buffered` call, the
read task would race ahead trying to read a file which is yet to be
written out completely.

Alongside this, we need to flush each write to the file, as there is a
chance that another thread may see stale data.

## What changes are included in this PR?

Adds a flush on write, and converts the read task to not buffer reads.

## Are these changes tested?

I haven't written a test, but I have been running the example in the
attached issue. While it now fails with allocation errors, the original
error goes away.

## Are there any user-facing changes?

Nope
@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants