feat(spill): introduce external sort buffer and writer memory manager#257
Merged
lxy-9602 merged 8 commits intoalibaba:mainfrom May 6, 2026
Merged
feat(spill): introduce external sort buffer and writer memory manager#257lxy-9602 merged 8 commits intoalibaba:mainfrom
lxy-9602 merged 8 commits intoalibaba:mainfrom
Conversation
… manager Introduce ExternalSortBuffer that wraps an InMemorySortBuffer and spills sorted runs to disk via IOManager channels when the in-memory budget is reached. Spilled runs are merged back through the existing sort-merge reader path on flush. Also add WriterMemoryManager to coordinate write-buffer memory across multiple BatchWriters by picking the largest writer to flush when the global budget is exceeded. New CoreOptions: - write-buffer-spillable - write-buffer-spill.max-disk-size - local-sort.max-num-file-handles - spill-compression - spill-compression.zstd-level (moved next to other spill options) Tests: - sort_buffer_test: external sort buffer spill & merge behaviour - writer_memory_manager_test: memory accounting and shrink-to-limit - extend sort_merge_reader_test / core_options_test / data_generator_test Generated-by: Aone Copilot (claude-4.7-opus)
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 20 out of 20 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
lxy-9602
reviewed
Apr 29, 2026
lxy-9602
reviewed
Apr 29, 2026
lxy-9602
reviewed
Apr 29, 2026
lxy-9602
reviewed
Apr 29, 2026
lxy-9602
reviewed
Apr 29, 2026
lxy-9602
reviewed
Apr 29, 2026
lxy-9602
reviewed
Apr 29, 2026
lxy-9602
reviewed
Apr 29, 2026
lxy-9602
reviewed
Apr 29, 2026
lxy-9602
reviewed
Apr 29, 2026
lxy-9602
reviewed
Apr 29, 2026
lxy-9602
reviewed
Apr 29, 2026
Co-authored-by: Copilot <copilot@github.com>
Collaborator
|
+1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Linked issue: #149
Introduce the spill-to-disk capability for the merge-tree write path and a global writer memory manager that coordinates write-buffer memory across multiple
BatchWriters.ExternalSortBuffer: wraps anInMemorySortBuffer. When the in-memory budget is exceeded, the sorted data is flushed to an on-disk file throughIOManagerchannels. OnCreateReaders()the in-memory data and the spilled files are merged back via the existingMergedKeyValueRecordReaderandSortMergeReaderpath.WriterMemoryManager: tracks per-writer memory usage and, when the global budget is exceeded, picks the largest writer to flush in order to shrink total memory back under the limit. Not thread-safe by design (In Paimon, the write process is single-threaded).Refactor the
CoreOptionsparser to support:ParseMemorySizeParseTimeDurationstd::optional<std::string>New
CoreOptionsexposed to users:write-buffer-spillablewrite-buffer-spill.max-disk-sizelocal-sort.max-num-file-handlesspill-compressionspill-compression.zstd-levelTests
core/mergetree/sort_buffer_test.cpp(new): coversExternalSortBufferspill + merge behaviour, including multiple spilled files, ordering guarantees and resource cleanup.core/memory/writer_memory_manager_test.cpp(new): covers register/unregister, memory accounting and shrink-to-limit flushing.core/mergetree/compact/sort_merge_reader_test.cpp,core/core_options_test.cpp,common/utils/string_utils_test.cppandtesting/utils/data_generator_test.cppto cover the new options and helpers used by the spill path.API and Format
New public options are added to
include/paimon/defs.h(Options::WRITE_BUFFER_SPILLABLE,WRITE_BUFFER_SPILL_MAX_DISK_SIZE,LOCAL_SORT_MAX_NUM_FILE_HANDLES,SPILL_COMPRESSION,SPILL_COMPRESSION_ZSTD_LEVEL). No storage format or protocol change.Documentation
No new user-facing documentation is required beyond the option comments in
include/paimon/defs.h.Generative AI tooling
Generated-by: Aone Copilot (claude-4.7-opus) and Github Copilot(GPT-5.4)