[feature](runtime-filter) Add adaptive global runtime filter tree publish#64851
[feature](runtime-filter) Add adaptive global runtime filter tree publish#64851BiteTheDDDDt wants to merge 1 commit into
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
|
run buildall |
There was a problem hiding this comment.
I found two blocking issues in the adaptive runtime-filter tree publish path.
Critical checkpoint conclusions:
- Goal: the PR implements adaptive tree publish, but the current implementation is not equivalent to direct publish on error handling and is not safe for mixed-version BE rolling upgrades.
- Scope: the code is mostly focused on runtime-filter publish, session option forwarding, and protocol fields.
- Concurrency/lifecycle: the new async relay closure is self-owned, but downstream RPC failure lifecycle is only logged instead of reaching the query cancellation path.
- Configuration:
runtime_filter_tree_publish_max_send_bytesis forwarded to BE and0disables tree publish; existing review threads already cover default/fallback wording. - Compatibility: the new
forward_targetsprotobuf field needs receiver capability gating before a BE can be selected as a relay. - Parallel paths: direct publish and relay publish now diverge for RPC/status failure handling.
- Tests: helper unit tests cover splitting/fanout/task construction, but not relay failure propagation or mixed old/new BE fallback.
- Observability/performance: existing threads already cover INFO logging; no separate new comment from this pass.
User focus: no additional user-provided review focus was present.
Subagent conclusions: optimizer-rewrite produced MAIN-001 and duplicate notes for the capability and out-of-scope Nereids candidates. tests-session-config produced MAIN-002 and an out-of-scope Nereids candidate. The Nereids candidate was dismissed because GitHub's authoritative PR file list contains only the seven runtime-filter/session/protocol files. Convergence round 1 ended with both live subagents replying NO_NEW_VALUABLE_FINDINGS for the same current ledger/comment set.
e904aa9 to
2b24ec5
Compare
|
/review |
|
run buildall |
There was a problem hiding this comment.
Reviewed the PR diff and related runtime-filter paths. I am not adding new inline comments because the substantiated correctness concern I still see is already covered by the existing rolling-upgrade thread, and the other prior relay-error-policy concern has been addressed in the current head.
Critical checkpoint conclusions:
- Goal: the PR adds adaptive tree publishing for global runtime filters, controlled by
runtime_filter_tree_publish_max_send_bytes. The main direct, tree-root, and relay helper paths are present. - Scope: the GitHub PR surface is the seven files listed by the harness and GitHub API; local
git diff base..headcontains unrelated stacked FE/regression changes, so I kept this review scoped to the authoritative PR files. - Concurrency and lifecycle: async BRPC callbacks in the root publish path keep callback state in
GlobalMergeContext; relay child RPCs now keep their request/callback state inRuntimeFilterRelayRpcClosure. I did not find a new lifetime issue in the current code. - Config propagation: the new FE session variable is forwarded through
SessionVariable.toThrift()intoTQueryOptions, and BE disables tree publish if the field is not set or is set to0. - Compatibility: the rolling BE upgrade risk from forwarding through an old BE is still a valid concern, but it is already raised in the existing inline thread at
runtime_filter_mgr.cpp:658, so I did not duplicate it. - Parallel paths: direct publish remains the fallback when tree publish is disabled or target fragment IDs are unavailable; relay forwarding now uses the same query-aware callback type as direct publish for child RPC responses.
- Tests: the added BE unit tests cover fanout splitting, fanout calculation, and task construction. I did not run BE/FE tests in this runner because
thirdparty/installedis missing;git diff --checkpassed for the seven authoritative PR files. - Observability: the tree/relay publish messages are
VLOG_NOTICEin the current head; the earlier INFO-log concern is already in the existing review context.
Subagent conclusions:
optimizer-rewritereportedNO_NEW_VALUABLE_FINDINGSin the initial pass and again in convergence round 1.tests-session-configreportedNO_NEW_VALUABLE_FINDINGSin the initial pass and again in convergence round 1.- No subagent candidates became new inline comments; duplicate suppression kept the rolling-upgrade compatibility issue on the existing thread.
User focus: no additional user-provided review focus was present.
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 29778 ms |
TPC-DS: Total hot run time: 171915 ms |
ClickBench: Total hot run time: 25.41 s |
FE Regression Coverage ReportIncrement line coverage |
1 similar comment
FE Regression Coverage ReportIncrement line coverage |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
af5375f to
a03874a
Compare
a03874a to
ac55c5c
Compare
|
run buildall |
ac55c5c to
2cfe4b9
Compare
|
run buildall |
|
/review |
There was a problem hiding this comment.
Reviewed the current PR head 2cfe4b9d882fe5d913cd7157cf9961293d51ce24 against the seven authoritative changed files.
I am not adding new inline comments in this pass. The main correctness issue I can still substantiate is the rolling-upgrade compatibility risk for tree publish through the new PPublishFilterRequestV2.forward_targets relay field: an older selected relay BE can ignore descendants and only apply its own fragment_ids. That is already covered by existing inline thread discussion_r3473478336, so I am not duplicating it.
Critical checkpoint conclusions:
- Goal: the PR adds adaptive global runtime-filter tree publish controlled by
runtime_filter_tree_publish_max_send_bytes; the current implementation covers FE option propagation, protobuf relay fields, BE fanout/task construction, relay forwarding, and helper unit tests. - Scope/focus: the change is mostly focused on runtime-filter publish, with one FE session variable and protocol fields. No additional user-provided focus was supplied.
- Concurrency/lifecycle: relay RPC closures now keep request/response/controller state alive and use
HandleErrorBrpcCallback; no new non-duplicate lifecycle issue found. - Compatibility: the old-BE relay compatibility issue remains the key concern, but it is already raised inline. Target-fragment-id assumptions and fallback removal are also already active review-thread context.
- Configuration/protocol: the new session variable is non-negative checked, forwarded through
SessionVariable.toThrift(), and consumed fromTQueryOptions; protobuf fields are appended. - Parallel paths: current Nereids and legacy coordinator target builders both populate
TRuntimeFilterTargetParamsV2.target_fragment_ids; BE load paths without the optional query option leave tree publish disabled. - Tests: new BE unit coverage checks fanout, slicing, and task construction. I did not run FE/BE tests in this runner because
thirdparty/installedis absent;git diff --checkover the PR files passed.
Subagent conclusions:
optimizer-rewrite: no new valuable candidate; it verified runtime-filter publish/relay semantics and marked existing compatibility/error-policy concerns as already covered or fixed in current head.tests-session-config: no new valuable candidate; it verified session variable propagation, thrift/proto additions, diff hygiene, and noted the same local test limitation.- Convergence round 1: both live subagents replied
NO_NEW_VALUABLE_FINDINGSfor the same current ledger/comment set after the last main ledger update.
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 29548 ms |
TPC-DS: Total hot run time: 172355 ms |
ClickBench: Total hot run time: 25.01 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
…lish
Issue Number: N/A
Related PR: N/A
Problem Summary: Global runtime filters are currently published from the merge coordinator directly to every scan target. When a merged filter is large and the cluster has many BEs, the coordinator sends many large duplicate RPC attachments. This change adds adaptive tree publish for global runtime filters so each relay node forwards a bounded number of copies based on runtime_filter_tree_publish_max_send_bytes.
Add runtime_filter_tree_publish_max_send_bytes session variable to control adaptive global runtime filter tree publish. Set it to 0 to disable tree publish.
- Test:
- Manual test: git diff --cached --check
- Manual test: build-support/check-format.sh
- Manual test: DORIS_HOME=/mnt/disk8/xiaolei/codespaces/doris2 DORIS_THIRDPARTY=/mnt/disk8/xiaolei/codespaces/doris2/thirdparty ninja -C be/build_Release src/exec/CMakeFiles/Exec.dir/runtime_filter/runtime_filter_mgr.cpp.o src/service/CMakeFiles/Service.dir/internal_service.cpp.o
- Manual test: ./build.sh --fe
- Behavior changed: Yes. Large global runtime filters can be published through an adaptive relay tree instead of direct coordinator fanout.
- Does this need documentation: No
2cfe4b9 to
75171cb
Compare
|
run buildall |
|
/review |
There was a problem hiding this comment.
Automated review summary for PR 64851 at head 75171cb.
I did not add new inline comments because the substantiated issues from this pass are already covered by existing inline threads:
- discussion_r3473478336: tree publish still depends on the new forward_targets field being understood by relay BEs; mixed-version BEs can drop all descendants behind an old relay.
- discussion_r3503788011: async relay child RPC failures still go through the generic HandleErrorBrpcCallback log path without query_id, filter_id, or target address, despite the reply saying this is done.
- discussion_r3473478331: relay child RPC error policy was improved in the current head with HandleErrorBrpcCallback and synchronous stub-init failure propagation; I did not find a distinct new error-policy issue beyond the existing thread.
Critical checkpoints:
- Goal/behavior: adaptive global runtime filter tree publish is implemented and the helper tests cover split/fanout/task construction, but mixed-version relay capability remains unresolved in the existing thread.
- Scope: the change is focused across BE runtime-filter publish, internal-service relay forwarding, FE session variable propagation, and proto/thrift IDL.
- Concurrency/lifecycle: no new lock-order issue found; async request/response/controller lifetime appears owned by the relay closure and callback shared pointers.
- Config/compatibility: runtime_filter_tree_publish_max_send_bytes is forwarded through TQueryOptions; compatibility remains the main risk because forward_targets is ignored by older BEs.
- Parallel paths: both Nereids RuntimeFiltersThriftBuilder and legacy Coordinator populate target_fragment_ids; I did not find a distinct parallel join/aggregate routing bug.
- Testing: helper BE unit tests were added, but I could not run tests or regenerate proto/thrift outputs in this checkout because thirdparty/installed and thirdparty/installed/bin/protoc are missing.
- Observability: the existing failed-forward-target logging thread still appears not fully addressed for async RPC failures.
- User focus: no additional user-provided review focus was supplied.
Subagent conclusions:
- optimizer-rewrite: no candidates appended; convergence round 1 returned NO_NEW_VALUABLE_FINDINGS.
- tests-session-config: no candidates appended; convergence round 1 returned NO_NEW_VALUABLE_FINDINGS; test/protoc execution was also blocked by missing thirdparty dependencies.
TPC-H: Total hot run time: 30097 ms |
TPC-DS: Total hot run time: 173927 ms |
ClickBench: Total hot run time: 25.25 s |
What problem does this PR solve?
Issue Number: N/A
Related PR: N/A
Problem Summary: Global runtime filters are currently published from the merge coordinator directly to every scan target. When a merged filter is large and the cluster has many BEs, the coordinator sends many large duplicate RPC attachments. This PR adds adaptive tree publish for global runtime filters so each relay node forwards a bounded number of copies based on runtime_filter_tree_publish_max_send_bytes.
Release note
Add runtime_filter_tree_publish_max_send_bytes session variable to control adaptive global runtime filter tree publish. Set it to 0 to disable tree publish.
Check List (For Author)