Skip to content

feat(spill): introduce external sort buffer and writer memory manager#257

Merged
lxy-9602 merged 8 commits intoalibaba:mainfrom
zjw1111:spill-pr2
May 6, 2026
Merged

feat(spill): introduce external sort buffer and writer memory manager#257
lxy-9602 merged 8 commits intoalibaba:mainfrom
zjw1111:spill-pr2

Conversation

@zjw1111
Copy link
Copy Markdown
Collaborator

@zjw1111 zjw1111 commented Apr 28, 2026

Purpose

Linked issue: #149

Introduce the spill-to-disk capability for the merge-tree write path and a global writer memory manager that coordinates write-buffer memory across multiple BatchWriters.

  • ExternalSortBuffer: wraps an InMemorySortBuffer. When the in-memory budget is exceeded, the sorted data is flushed to an on-disk file through IOManager channels. On CreateReaders() the in-memory data and the spilled files are merged back via the existing MergedKeyValueRecordReader and SortMergeReader path.

  • WriterMemoryManager: tracks per-writer memory usage and, when the global budget is exceeded, picks the largest writer to flush in order to shrink total memory back under the limit. Not thread-safe by design (In Paimon, the write process is single-threaded).

  • Refactor the CoreOptions parser to support:

    • ParseMemorySize
    • ParseTimeDuration
    • parsing std::optional<std::string>
  • New CoreOptions exposed to users:

    • write-buffer-spillable
    • write-buffer-spill.max-disk-size
    • local-sort.max-num-file-handles
    • spill-compression
    • spill-compression.zstd-level

Tests

  • core/mergetree/sort_buffer_test.cpp (new): covers ExternalSortBuffer spill + merge behaviour, including multiple spilled files, ordering guarantees and resource cleanup.
  • core/memory/writer_memory_manager_test.cpp (new): covers register/unregister, memory accounting and shrink-to-limit flushing.
  • Extended core/mergetree/compact/sort_merge_reader_test.cpp, core/core_options_test.cpp, common/utils/string_utils_test.cpp and testing/utils/data_generator_test.cpp to cover the new options and helpers used by the spill path.

API and Format

New public options are added to include/paimon/defs.h (Options::WRITE_BUFFER_SPILLABLE, WRITE_BUFFER_SPILL_MAX_DISK_SIZE, LOCAL_SORT_MAX_NUM_FILE_HANDLES, SPILL_COMPRESSION, SPILL_COMPRESSION_ZSTD_LEVEL). No storage format or protocol change.

Documentation

No new user-facing documentation is required beyond the option comments in include/paimon/defs.h.

Generative AI tooling

Generated-by: Aone Copilot (claude-4.7-opus) and Github Copilot(GPT-5.4)

… manager

Introduce ExternalSortBuffer that wraps an InMemorySortBuffer and spills
sorted runs to disk via IOManager channels when the in-memory budget is
reached. Spilled runs are merged back through the existing sort-merge
reader path on flush.

Also add WriterMemoryManager to coordinate write-buffer memory across
multiple BatchWriters by picking the largest writer to flush when the
global budget is exceeded.

New CoreOptions:
  - write-buffer-spillable
  - write-buffer-spill.max-disk-size
  - local-sort.max-num-file-handles
  - spill-compression
  - spill-compression.zstd-level (moved next to other spill options)

Tests:
  - sort_buffer_test: external sort buffer spill & merge behaviour
  - writer_memory_manager_test: memory accounting and shrink-to-limit
  - extend sort_merge_reader_test / core_options_test / data_generator_test

Generated-by: Aone Copilot (claude-4.7-opus)
Copilot AI review requested due to automatic review settings April 28, 2026 09:12
@zjw1111 zjw1111 changed the title feat(mergetree): support spill-to-disk write buffer and writer memory manager feat(mergetree): introduce external sort buffer and writer memory manager Apr 28, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@zjw1111 zjw1111 requested a review from Copilot April 28, 2026 09:37
@zjw1111 zjw1111 changed the title feat(mergetree): introduce external sort buffer and writer memory manager feat(spill): introduce external sort buffer and writer memory manager Apr 28, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

zjw1111 and others added 2 commits April 28, 2026 17:58
Co-authored-by: Copilot <copilot@github.com>
@zjw1111 zjw1111 requested a review from Copilot April 28, 2026 11:04
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/paimon/core/memory/writer_memory_manager.cpp
Comment thread src/paimon/core/memory/writer_memory_manager.cpp
Comment thread src/paimon/core/mergetree/external_sort_buffer.cpp
Comment thread include/paimon/defs.h Outdated
Comment thread src/paimon/core/mergetree/sort_buffer_test.cpp Outdated
Comment thread src/paimon/core/mergetree/compact/sort_merge_reader_with_min_heap.h Outdated
Comment thread src/paimon/core/mergetree/compact/sort_merge_reader_test.cpp
Comment thread src/paimon/core/mergetree/external_sort_buffer.cpp
Comment thread src/paimon/core/mergetree/external_sort_buffer.cpp Outdated
Comment thread src/paimon/core/mergetree/external_sort_buffer.cpp
Comment thread src/paimon/core/mergetree/sort_buffer_test.cpp Outdated
Comment thread src/paimon/core/memory/writer_memory_manager.h
Comment thread src/paimon/core/memory/writer_memory_manager.h
Comment thread src/paimon/core/memory/writer_memory_manager.cpp
Comment thread src/paimon/core/memory/writer_memory_manager_test.cpp
Comment thread src/paimon/core/memory/writer_memory_manager.h
zjw1111 and others added 3 commits April 29, 2026 15:16
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
@lucasfang
Copy link
Copy Markdown
Collaborator

+1

Copy link
Copy Markdown
Collaborator

@lxy-9602 lxy-9602 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@lxy-9602 lxy-9602 merged commit 6695b84 into alibaba:main May 6, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants