Skip to content

[multistage] Support forcing colocated set operations to avoid data shuffle using query hint#18804

Open
yashmayya wants to merge 1 commit into
apache:masterfrom
yashmayya:colocated-set-op-hint
Open

[multistage] Support forcing colocated set operations to avoid data shuffle using query hint#18804
yashmayya wants to merge 1 commit into
apache:masterfrom
yashmayya:colocated-set-op-hint

Conversation

@yashmayya

Copy link
Copy Markdown
Contributor

Adds a new query hint setOpOptions(is_colocated_by_set_op_keys='...') that forces (or disables) colocated, pre-partitioned exchanges for set operations (UNION / UNION ALL / INTERSECT / EXCEPT) in order to avoid a data shuffle when the inputs are already partitioned compatibly. This is the set-operation equivalent of the existing joinOptions(is_colocated_by_join_keys='...') join hint and the windowOptions(is_partitioned_by_window_keys='...') window hint (#17395).

By default the planner inserts a hash exchange (on the full output row) below every input of a set operation. When the inputs are already co-partitioned, that shuffle is unnecessary; this hint lets the user assert colocation so the planner emits a direct (1-to-1, no-shuffle) exchange instead. This also registers the setOpOptions hint strategy (HintPredicates.SETOP), which was previously not registered at all.

Like the equivalent join / window hints, this is opt-in and trusts the user's assertion. Because a set operation matches rows on the entire output row, forcing is_colocated_by_set_op_keys='true' is only correct when every input is partitioned the same way (same partition function and count) on one or more of the projected columns, so that rows that are equal across all projected columns land on the same worker. Forcing it on data that is not actually colocated will produce incorrect results for INTERSECT, EXCEPT and distinct UNION (UNION ALL only concatenates, so it is always safe). The hint is honored by the V1 query planner; the V2 physical optimizer determines colocation on its own and ignores it.

Hint placement. Unlike a join/window node, a set operation is an ancestor of its branch SELECTs, so a hint on the leading SELECT does not naturally attach to it. The hint is therefore resolved from either the set operation itself or its first branch, supporting two placements:

  • Inline on the first branch: SELECT /*+ setOpOptions(is_colocated_by_set_op_keys='true') */ col FROM a UNION ALL SELECT col FROM b
  • On an outer SELECT wrapping the set operation: SELECT /*+ setOpOptions(is_colocated_by_set_op_keys='true') */ * FROM (SELECT col FROM a UNION ALL SELECT col FROM b)

Two limitations worth calling out:

  • Plain distinct UNION is rewritten to an aggregate over UNION ALL before the exchange rule runs, so the inline hint does not apply to it (use UNION ALL, or the outer-wrap form).
  • For deeply-nested INTERSECT/EXCEPT the inline hint only colocates the innermost level (a safe degradation — the outer levels shuffle); the outer-wrap form covers all levels.

Tests added:

  • Planner unit tests in QueryCompilationTest asserting the hint forces / disables a pre-partitioned exchange across UNION ALL / INTERSECT / EXCEPT, both the inline and outer-wrap placements, the no-hint baseline, auto-detection, the ='false' override, and first-input-wins precedence when branches carry conflicting values.
  • A before/after physical-plan contrast pair in ExplainPhysicalPlans.json showing the hint turn a full shuffle into a [PARTITIONED] (direct, 1-to-1) exchange.
  • Runtime, H2-validated cases in QueryHints.json on physically-partitioned tables: INTERSECT / EXCEPT / UNION ALL with ='true', the ='false' override, a multi-column set op colocated on a subset (the partition column) of the projected columns, and a mismatched-partition-count case where the planner cannot form a direct exchange and safely falls back to a shuffle.

Follow-up: user-facing documentation for the new hint will be added to the pinot-docs repo, mirroring the existing is_colocated_by_join_keys entry (scope, the V1-only note, and the partitioning precondition under which 'true' is safe).

@yashmayya yashmayya added the multi-stage Related to the multi-stage query engine label Jun 18, 2026
@yashmayya yashmayya force-pushed the colocated-set-op-hint branch from ea0f413 to 73180d9 Compare June 18, 2026 20:15
@yashmayya yashmayya requested a review from Jackie-Jiang June 18, 2026 20:20
@codecov-commenter

codecov-commenter commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 88.23529% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.80%. Comparing base (d469cb1) to head (73180d9).
⚠️ Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
...pache/pinot/calcite/rel/hint/PinotHintOptions.java 66.66% 1 Missing ⚠️
...te/rel/rules/PinotSetOpExchangeNodeInsertRule.java 92.30% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18804      +/-   ##
============================================
+ Coverage     64.78%   64.80%   +0.01%     
  Complexity     1309     1309              
============================================
  Files          3381     3386       +5     
  Lines        209967   210166     +199     
  Branches      32891    32923      +32     
============================================
+ Hits         136020   136188     +168     
- Misses        62979    63014      +35     
+ Partials      10968    10964       -4     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 64.80% <88.23%> (+0.01%) ⬆️
temurin 64.80% <88.23%> (+0.01%) ⬆️
unittests 64.79% <88.23%> (+0.01%) ⬆️
unittests1 56.98% <88.23%> (+0.02%) ⬆️
unittests2 37.26% <5.88%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

multi-stage Related to the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants