feat: implement spill callback for cross-task memory eviction [experimental] by andygrove · Pull Request #3869 · apache/datafusion-comet

andygrove · 2026-04-01T12:06:48Z

Which issue does this PR close?

Related to reducing Comet's off-heap memory requirements for large-scale workloads.

Rationale for this change

When running TPC-H at SF100+ scale, Comet requires significantly more off-heap memory than Spark alone. Benchmarking on TPC-H SF100 with local[4] showed:

Config	Q1 (agg)	Q5 (5-way join)	Q9 (6-way join)
Spark 4g offHeap	2700 MB	5167 MB	4580 MB
Comet 4g offHeap	679 MB	5534 MB	5911 MB
Comet 8g offHeap	665 MB	5440 MB	6359 MB

Q9 showed elastic memory growth (450 MB increase from 4g to 8g offHeap), traced to shuffle writer greedy buffering. The root cause is that Comet's NativeMemoryConsumer.spill() returned 0, preventing Spark from reclaiming memory across tasks.

What changes are included in this PR?

Spill callback implementation

SpillState (native/core/src/execution/memory_pools/spill.rs): Shared state with atomics and condvar for coordinating spill requests between Spark's memory manager thread and DataFusion's operator threads.
CometUnifiedMemoryPool: Checks SpillState.pressure() in try_grow() — returns ResourcesExhausted when spill pressure is active, triggering DataFusion's Sort/Aggregate/Shuffle operators to spill internally. Tracks freed bytes in shrink().
CometTaskMemoryManager.spill(): Now forwards spill requests to native via JNI requestSpill() instead of returning 0. Waits up to 10 seconds for operators to react.
CometExecIterator: Wires the native plan handle to CometTaskMemoryManager after creation, clears it before releasePlan to prevent callbacks after destruction.

Memory profiling tools

benchmarks/tpc/memory-profile.sh: Script that runs TPC-H queries individually under different configurations (Spark-only, Comet with varying offHeap sizes) in local mode, capturing peak RSS via /usr/bin/time -l.
docs/source/contributor-guide/memory-management.md: Analysis of Comet's memory management architecture, benchmark results, comparison with Gluten's approach, and documentation of the spill callback design.

How are these changes tested?

Benchmarked with TPC-H SF100 comparing peak RSS before and after the change. The spill callback is exercised whenever Spark's memory manager calls spill() on Comet's NativeMemoryConsumer, which happens when multiple concurrent tasks compete for the shared off-heap pool. Existing test suites cover the execution paths (Sort, Aggregate, Shuffle) that react to ResourcesExhausted by spilling.

Add benchmarking script to measure peak RSS per TPC-H query under different Spark/Comet configurations and off-heap memory sizes. Includes analysis document investigating why Comet requires more off-heap memory than expected.

…ting

local[8] benchmark shows Comet memory usage now on par with Spark (8525 vs 8476 MB) and elastic growth eliminated.

… to development guide

andygrove added 2 commits April 1, 2026 06:06

docs: move memory analysis into contributor guide

d8ea0ac

andygrove changed the title ~~feat: add memory profiling tools for off-heap analysis~~ feat: add memory profiling tools for off-heap analysis [experimental] Apr 1, 2026

andygrove added 7 commits April 1, 2026 06:54

docs: add benchmark results and fix memory-profile.sh time parsing

73eaf85

docs: add shuffle isolation results and PR 3845 notes

b54a0ce

feat: add SpillState for cross-thread spill coordination

db233e0

feat: wire SpillState into CometUnifiedMemoryPool

8d4abe2

feat: add JNI requestSpill function with SpillState in ExecutionContext

9faa80a

feat: implement spill() callback in CometTaskMemoryManager

f82556c

feat: wire native plan handle to CometTaskMemoryManager for spill rou…

6a7fc43

…ting

andygrove changed the title ~~feat: add memory profiling tools for off-heap analysis [experimental]~~ feat: implement spill callback for cross-task memory eviction Apr 1, 2026

docs: update memory analysis with spill callback results

c73cf73

local[8] benchmark shows Comet memory usage now on par with Spark (8525 vs 8476 MB) and elastic growth eliminated.

andygrove changed the title ~~feat: implement spill callback for cross-task memory eviction~~ feat: implement spill callback for cross-task memory eviction [experimental] Apr 1, 2026

andygrove added 2 commits April 1, 2026 16:57

chore: apply cargo fmt and prettier formatting

d43e269

docs: remove standalone memory-management doc, add spill coordination…

221655a

… to development guide

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement spill callback for cross-task memory eviction [experimental]#3869

feat: implement spill callback for cross-task memory eviction [experimental]#3869
andygrove wants to merge 12 commits intoapache:mainfrom
andygrove:memory-profiling-tools

andygrove commented Apr 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andygrove commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Spill callback implementation

Memory profiling tools

How are these changes tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

andygrove commented Apr 1, 2026 •

edited

Loading