feat: support LookupMergeTreeCompactRewriter#186
Open
lszskye wants to merge 9 commits intoalibaba:mainfrom
Open
feat: support LookupMergeTreeCompactRewriter#186lszskye wants to merge 9 commits intoalibaba:mainfrom
lszskye wants to merge 9 commits intoalibaba:mainfrom
Conversation
lszskye
commented
Mar 19, 2026
There was a problem hiding this comment.
Pull request overview
This PR adds lookup-based merge-tree compaction rewriting (via LookupMergeTreeCompactRewriter) and introduces per-level file-format selection, updating compaction/read/write paths and expanding unit tests accordingly.
Changes:
- Introduces
LookupMergeTreeCompactRewriter+ChangelogMergeTreeRewriterto support lookup-driven rewrite/upgrade flows (including DV-aware paths). - Adds
file.format.per.levelsupport inCoreOptionsand updates call sites to useGetFileFormat()/GetWriteFileFormat(level). - Adds
FileStorePathFactoryCacheand new tests for lookup rewrite behavior and wrapper logic.
Reviewed changes
Copilot reviewed 55 out of 55 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
| src/paimon/core/utils/file_store_path_factory_test.cpp | Updates tests to use GetFileFormat() for path factory creation. |
| src/paimon/core/utils/file_store_path_factory_cache_test.cpp | Adds unit test for the new path-factory cache. |
| src/paimon/core/utils/file_store_path_factory_cache.h | Introduces a cache to reuse FileStorePathFactory by format identifier. |
| src/paimon/core/table/source/table_scan.cpp | Switches path factory creation to use GetFileFormat(). |
| src/paimon/core/table/source/table_read.cpp | Switches path factory creation to use GetFileFormat(). |
| src/paimon/core/postpone/postpone_bucket_writer.cpp | Updates write-format selection to GetWriteFileFormat(level). |
| src/paimon/core/postpone/postpone_bucket_file_store_write.h | Updates write-format selection to GetWriteFileFormat(level). |
| src/paimon/core/options/lookup_strategy.h | Adds LookupStrategy struct encapsulating lookup decision inputs. |
| src/paimon/core/operation/raw_file_split_read_test.cpp | Switches path factory creation to use GetFileFormat(). |
| src/paimon/core/operation/orphan_files_cleaner.cpp | Switches path factory creation to use GetFileFormat(). |
| src/paimon/core/operation/merge_file_split_read_test.cpp | Switches path factory creation to use GetFileFormat(). |
| src/paimon/core/operation/merge_file_split_read.h | Adds API to inject a merge-function wrapper and refactors wrapper retrieval. |
| src/paimon/core/operation/merge_file_split_read.cpp | Implements SetMergeFunctionWrapper. |
| src/paimon/core/operation/manifest_file_merger_test.cpp | Switches path factory creation to use GetFileFormat(). |
| src/paimon/core/operation/key_value_file_store_scan_test.cpp | Switches path factory creation to use GetFileFormat(). |
| src/paimon/core/operation/file_store_write.cpp | Uses GetWriteFileFormat(level) for write paths. |
| src/paimon/core/operation/file_store_commit.cpp | Uses GetFileFormat() and updates assertions accordingly. |
| src/paimon/core/operation/expire_snapshots_test.cpp | Switches path factory creation to use GetFileFormat(). |
| src/paimon/core/operation/append_only_file_store_write.cpp | Uses GetFileFormat() for writer creation. |
| src/paimon/core/migrate/file_meta_utils.cpp | Uses GetFileFormat() for migration commit message generation. |
| src/paimon/core/mergetree/merge_tree_writer.cpp | Updates write-format selection to GetWriteFileFormat(level). |
| src/paimon/core/mergetree/lookup_levels_test.cpp | Adds coverage for closing and tmp-dir cleanup behavior. |
| src/paimon/core/mergetree/lookup_levels.h | Adds Close() to clear lookup cache. |
| src/paimon/core/mergetree/compact/reducer_merge_function_wrapper.h | Changes GetResult() to reset wrapper state after producing a result. |
| src/paimon/core/mergetree/compact/merge_tree_compact_rewriter_test.cpp | Updates rewriter creation to use FileStorePathFactoryCache. |
| src/paimon/core/mergetree/compact/merge_tree_compact_rewriter.h | Refactors rewriter to use path-factory cache; adds wrapper factory injection points. |
| src/paimon/core/mergetree/compact/merge_tree_compact_rewriter.cpp | Implements per-level format writing and wrapper injection during merge-read. |
| src/paimon/core/mergetree/compact/lookup_merge_tree_compact_rewriter_test.cpp | Adds comprehensive tests for lookup-based rewrite/upgrade and DV behavior. |
| src/paimon/core/mergetree/compact/lookup_merge_tree_compact_rewriter.h | Introduces lookup-based rewriter interface and wrapper factories. |
| src/paimon/core/mergetree/compact/lookup_merge_tree_compact_rewriter.cpp | Implements lookup-based rewrite/upgrade decisions and DV updates. |
| src/paimon/core/mergetree/compact/lookup_merge_function_test.cpp | Adds tests for high-level selection and insertion ordering. |
| src/paimon/core/mergetree/compact/lookup_merge_function.h | Enhances merge function to track key, level-0 presence, and pick high-level candidate. |
| src/paimon/core/mergetree/compact/lookup_changelog_merge_function_wrapper_test.cpp | Adds tests for lookup-changelog wrapper behavior including DV. |
| src/paimon/core/mergetree/compact/lookup_changelog_merge_function_wrapper.h | Introduces wrapper that performs lookup/DV marking and merges candidates. |
| src/paimon/core/mergetree/compact/first_row_merge_function_wrapper_test.cpp | Adds tests for first-row lookup wrapper behavior. |
| src/paimon/core/mergetree/compact/first_row_merge_function_wrapper.h | Adds wrapper for first-row lookup behavior. |
| src/paimon/core/mergetree/compact/first_row_merge_function.h | Exposes ContainsHighLevel() for wrapper logic. |
| src/paimon/core/mergetree/compact/compact_rewriter.h | Changes Upgrade() to be non-const. |
| src/paimon/core/mergetree/compact/changelog_merge_tree_rewriter.h | Adds a base rewriter that can rewrite and/or produce changelog per strategy. |
| src/paimon/core/mergetree/compact/changelog_merge_tree_rewriter.cpp | Implements rewrite/upgrade routing based on strategy. |
| src/paimon/core/manifest/manifest_entry_writer_test.cpp | Updates write-format selection to GetWriteFileFormat(level). |
| src/paimon/core/key_value.h | Makes KeyValue copyable and changes value ownership to shared_ptr. |
| src/paimon/core/io/single_file_writer_test.cpp | Updates write-format selection to GetWriteFileFormat(level). |
| src/paimon/core/global_index/global_index_write_task.cpp | Switches path factory creation to use GetFileFormat(). |
| src/paimon/core/global_index/global_index_scan_impl.cpp | Switches path factory creation to use GetFileFormat(). |
| src/paimon/core/core_options_test.cpp | Adds coverage for per-level formats and lookup strategy; adds invalid-format tests. |
| src/paimon/core/core_options.h | Adds APIs: GetFileFormat(), GetWriteFileFormat(level), GetLookupStrategy(). |
| src/paimon/core/core_options.cpp | Implements per-level format parsing and lookup strategy computation. |
| src/paimon/core/append/append_only_writer.cpp | Uses GetFileFormat() for writer creation. |
| src/paimon/common/sst/sst_file_reader.cpp | Adds null checks for cached block reads and returns errors on failure. |
| src/paimon/common/io/cache/cache_manager.cpp | Handles null returns from cache get path. |
| src/paimon/common/defs.cpp | Adds new option key file.format.per.level. |
| src/paimon/common/data/generic_row.h | Changes data holder storage from unique_ptr to shared_ptr. |
| src/paimon/CMakeLists.txt | Registers new rewriter sources and new unit tests. |
| include/paimon/defs.h | Documents new file.format.per.level option. |
Comments suppressed due to low confidence (6)
src/paimon/core/utils/file_store_path_factory_cache_test.cpp:1
- This test accesses
FileStorePathFactoryCache::format_to_path_factory_andFileStorePathFactory::format_identifier_directly. Those members are private in the new cache header (and likely private inFileStorePathFactory), so this will not compile. Prefer asserting the cache behavior via public surface: add aSize()accessor (or similar) onFileStorePathFactoryCacheand validate format via a public getter onFileStorePathFactory(or by observing generated paths/extensions), or declare the test as a friend if you intentionally want to white-box test internals.
src/paimon/core/utils/file_store_path_factory_cache_test.cpp:1 - This test accesses
FileStorePathFactoryCache::format_to_path_factory_andFileStorePathFactory::format_identifier_directly. Those members are private in the new cache header (and likely private inFileStorePathFactory), so this will not compile. Prefer asserting the cache behavior via public surface: add aSize()accessor (or similar) onFileStorePathFactoryCacheand validate format via a public getter onFileStorePathFactory(or by observing generated paths/extensions), or declare the test as a friend if you intentionally want to white-box test internals.
src/paimon/core/utils/file_store_path_factory_cache_test.cpp:1 - This test accesses
FileStorePathFactoryCache::format_to_path_factory_andFileStorePathFactory::format_identifier_directly. Those members are private in the new cache header (and likely private inFileStorePathFactory), so this will not compile. Prefer asserting the cache behavior via public surface: add aSize()accessor (or similar) onFileStorePathFactoryCacheand validate format via a public getter onFileStorePathFactory(or by observing generated paths/extensions), or declare the test as a friend if you intentionally want to white-box test internals.
src/paimon/core/utils/file_store_path_factory_cache_test.cpp:1 - This test accesses
FileStorePathFactoryCache::format_to_path_factory_andFileStorePathFactory::format_identifier_directly. Those members are private in the new cache header (and likely private inFileStorePathFactory), so this will not compile. Prefer asserting the cache behavior via public surface: add aSize()accessor (or similar) onFileStorePathFactoryCacheand validate format via a public getter onFileStorePathFactory(or by observing generated paths/extensions), or declare the test as a friend if you intentionally want to white-box test internals.
src/paimon/core/mergetree/compact/reducer_merge_function_wrapper.h:1 Reset()is only called on the success path. Ifmerge_function_->GetResult()returns an error,PAIMON_ASSIGN_OR_RAISEwill return early and skip resetting state, which can leave the wrapper in a partially-initialized state for subsequent use. Consider using a scope guard to ensureReset()runs regardless of success/failure (or explicitly reset before returning the error).
src/paimon/core/mergetree/lookup_levels.h:1- Correct the typo in the comment: 'TODDO' should be 'TODO'.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
src/paimon/core/mergetree/compact/lookup_merge_tree_compact_rewriter.cpp
Show resolved
Hide resolved
src/paimon/core/mergetree/compact/lookup_changelog_merge_function_wrapper_test.cpp
Show resolved
Hide resolved
src/paimon/core/mergetree/compact/lookup_merge_tree_compact_rewriter_test.cpp
Show resolved
Hide resolved
src/paimon/core/mergetree/compact/lookup_merge_tree_compact_rewriter_test.cpp
Show resolved
Hide resolved
520e16c to
f47c354
Compare
baccb40 to
48a2190
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
support LookupMergeTreeCompactRewriter for Rewrite process with lookup
Linked issue: #93
Tests
src/paimon/core/mergetree/compact/lookup_merge_tree_compact_rewriter_test.cpp