Skip to content

Support GROUP BY GROUPING SETS / ROLLUP / CUBE in the multi-stage engine#18664

Open
xiangfu0 wants to merge 1 commit into
apache:masterfrom
xiangfu0:claude/modest-wright-82ac58
Open

Support GROUP BY GROUPING SETS / ROLLUP / CUBE in the multi-stage engine#18664
xiangfu0 wants to merge 1 commit into
apache:masterfrom
xiangfu0:claude/modest-wright-82ac58

Conversation

@xiangfu0

@xiangfu0 xiangfu0 commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Builds on the single-stage support (#18662, merged). Adds leaf pushdown for GROUP BY ROLLUP / CUBE / GROUPING SETS in the multi-stage engine.

A pushable grouping-set aggregate compiles to a single single-stage-engine (SSE) leaf scan that emits a synthetic $groupingId discriminator (a per-set participation mask, one bit per union group column). The multi-stage final stage groups on [groupCols, $groupingId] to merge partials while keeping the sets distinct, then a top project drops $groupingId. Non-pushable cases (exact SQL DISTINCT, joins, >31 group columns, window functions, LIMIT/OFFSET below the aggregate, multi-value grouping keys) fall back to a UNION ALL of ordinary aggregates.

  • Any non-DISTINCT aggregate pushes down (SUM/AVG/COUNT/MIN/MAX, DISTINCTCOUNT, sketches, percentiles) by reusing the standard LEAF/FINAL split machinery.
  • GROUPING() / GROUPING_ID() indicators push down too — computed in the top project from $groupingId via integer arithmetic, matching GroupingSets.groupingValue.

Testing

  • Plan-level single-scan assertions (pushable → one scan; fallback cases → UNION ALL).
  • In-process value tests (SUM/AVG/GROUPING/GROUPING_ID, null handling).
  • A both-engine cluster integration test (GroupingSetsQueriesTest) asserting identical single-stage and multi-stage results, including the real-NULL vs rolled-up-NULL distinction and a DISTINCTCOUNTHLL sketch merge.

🤖 Generated with Claude Code

@codecov-commenter

codecov-commenter commented Jun 3, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 80.38278% with 41 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.82%. Comparing base (ff7df48) to head (2140079).

Files with missing lines Patch % Lines
...apache/pinot/calcite/rel/GroupingSetsExpander.java 79.03% 10 Missing and 3 partials ⚠️
...not/calcite/rel/GroupingSetsPushdownTransform.java 71.73% 10 Missing and 3 partials ⚠️
...el/rules/PinotAggregateExchangeNodeInsertRule.java 83.54% 9 Missing and 4 partials ⚠️
...he/pinot/query/planner/plannode/AggregateNode.java 71.42% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18664      +/-   ##
============================================
+ Coverage     64.75%   64.82%   +0.07%     
  Complexity     1319     1319              
============================================
  Files          3391     3393       +2     
  Lines        210891   211092     +201     
  Branches      33105    33149      +44     
============================================
+ Hits         136552   136837     +285     
+ Misses        63320    63225      -95     
- Partials      11019    11030      +11     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 64.82% <80.38%> (+0.07%) ⬆️
temurin 64.82% <80.38%> (+0.07%) ⬆️
unittests 64.82% <80.38%> (+0.07%) ⬆️
unittests1 57.06% <80.38%> (+0.11%) ⬆️
unittests2 37.14% <4.30%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 force-pushed the claude/modest-wright-82ac58 branch from 61c2096 to 94aab3e Compare June 3, 2026 10:48
@xiangfu0 xiangfu0 marked this pull request as draft June 10, 2026 03:53
@xiangfu0 xiangfu0 force-pushed the claude/modest-wright-82ac58 branch from 94aab3e to 84e96de Compare June 20, 2026 10:05
@xiangfu0 xiangfu0 changed the title Support SQL GROUPING SETS / ROLLUP / CUBE on both query engines Support GROUP BY GROUPING SETS / ROLLUP / CUBE in the multi-stage engine Jun 20, 2026
@xiangfu0 xiangfu0 marked this pull request as ready for review June 20, 2026 10:08
Builds on the single-stage support (apache#18662). A pushable grouping-set aggregate compiles to a
single single-stage-engine (SSE) leaf scan that emits a synthetic $groupingId discriminator
(per-set participation mask, one bit per union group column); the multi-stage final stage groups
by [groupCols, $groupingId] to merge partials while keeping the sets distinct, and a top project
drops $groupingId. Non-pushable cases (exact SQL DISTINCT, joins, >31 group columns, window
functions, LIMIT/OFFSET below the aggregate, multi-value grouping keys, and aggregation-free or
indicator-only grouping sets) fall back to a UNION ALL of ordinary aggregates.

- Any non-DISTINCT aggregate pushes down (SUM/AVG/COUNT/MIN/MAX, DISTINCTCOUNT, sketches,
  percentiles) as long as at least one real, partially-mergeable aggregate is present, by reusing
  the standard LEAF/FINAL split machinery.
- GROUPING() / GROUPING_ID() are computed in the top project from $groupingId via integer
  arithmetic, matching GroupingSets.groupingValue; when they are the only aggregates the query
  falls back to UNION ALL (the SSE leaf cannot run a grouping-set scan with no aggregation).
- pinot-common: plan.proto AggregateNode.groupingSetMasks + shared GroupingSets.participationMask.
- pinot-query-planner: GroupingSetsPushdownTransform (shared isGroupingIndicator gate),
  GroupingSetsExpander (fallback), PinotAggregateExchangeNodeInsertRule, RelToPlanNodeConverter,
  AggregateNode + serde, PinotOperatorTable, RowExpressionValidationVisitor, QueryEnvironment.
- pinot-query-runtime: ServerPlanRequestVisitor.

Validated by plan-level single-scan tests, in-process value tests (SUM/AVG/GROUPING/GROUPING_ID,
null handling), and a both-engine cluster integration test in which every case runs on both the
single-stage and multi-stage engines, with per-engine assertions where the engines diverge
(multi-value grouping keys and aggregation-free grouping sets).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@xiangfu0 xiangfu0 force-pushed the claude/modest-wright-82ac58 branch from 84e96de to 2140079 Compare June 20, 2026 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants