Support GROUP BY GROUPING SETS / ROLLUP / CUBE in the multi-stage engine#18664
Open
xiangfu0 wants to merge 1 commit into
Open
Support GROUP BY GROUPING SETS / ROLLUP / CUBE in the multi-stage engine#18664xiangfu0 wants to merge 1 commit into
xiangfu0 wants to merge 1 commit into
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18664 +/- ##
============================================
+ Coverage 64.75% 64.82% +0.07%
Complexity 1319 1319
============================================
Files 3391 3393 +2
Lines 210891 211092 +201
Branches 33105 33149 +44
============================================
+ Hits 136552 136837 +285
+ Misses 63320 63225 -95
- Partials 11019 11030 +11
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
61c2096 to
94aab3e
Compare
94aab3e to
84e96de
Compare
Builds on the single-stage support (apache#18662). A pushable grouping-set aggregate compiles to a single single-stage-engine (SSE) leaf scan that emits a synthetic $groupingId discriminator (per-set participation mask, one bit per union group column); the multi-stage final stage groups by [groupCols, $groupingId] to merge partials while keeping the sets distinct, and a top project drops $groupingId. Non-pushable cases (exact SQL DISTINCT, joins, >31 group columns, window functions, LIMIT/OFFSET below the aggregate, multi-value grouping keys, and aggregation-free or indicator-only grouping sets) fall back to a UNION ALL of ordinary aggregates. - Any non-DISTINCT aggregate pushes down (SUM/AVG/COUNT/MIN/MAX, DISTINCTCOUNT, sketches, percentiles) as long as at least one real, partially-mergeable aggregate is present, by reusing the standard LEAF/FINAL split machinery. - GROUPING() / GROUPING_ID() are computed in the top project from $groupingId via integer arithmetic, matching GroupingSets.groupingValue; when they are the only aggregates the query falls back to UNION ALL (the SSE leaf cannot run a grouping-set scan with no aggregation). - pinot-common: plan.proto AggregateNode.groupingSetMasks + shared GroupingSets.participationMask. - pinot-query-planner: GroupingSetsPushdownTransform (shared isGroupingIndicator gate), GroupingSetsExpander (fallback), PinotAggregateExchangeNodeInsertRule, RelToPlanNodeConverter, AggregateNode + serde, PinotOperatorTable, RowExpressionValidationVisitor, QueryEnvironment. - pinot-query-runtime: ServerPlanRequestVisitor. Validated by plan-level single-scan tests, in-process value tests (SUM/AVG/GROUPING/GROUPING_ID, null handling), and a both-engine cluster integration test in which every case runs on both the single-stage and multi-stage engines, with per-engine assertions where the engines diverge (multi-value grouping keys and aggregation-free grouping sets). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
84e96de to
2140079
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Builds on the single-stage support (#18662, merged). Adds leaf pushdown for
GROUP BY ROLLUP / CUBE / GROUPING SETSin the multi-stage engine.A pushable grouping-set aggregate compiles to a single single-stage-engine (SSE) leaf scan that emits a synthetic
$groupingIddiscriminator (a per-set participation mask, one bit per union group column). The multi-stage final stage groups on[groupCols, $groupingId]to merge partials while keeping the sets distinct, then a top project drops$groupingId. Non-pushable cases (exact SQLDISTINCT, joins, >31 group columns, window functions,LIMIT/OFFSETbelow the aggregate, multi-value grouping keys) fall back to aUNION ALLof ordinary aggregates.DISTINCTaggregate pushes down (SUM/AVG/COUNT/MIN/MAX, DISTINCTCOUNT, sketches, percentiles) by reusing the standard LEAF/FINAL split machinery.GROUPING()/GROUPING_ID()indicators push down too — computed in the top project from$groupingIdvia integer arithmetic, matchingGroupingSets.groupingValue.Testing
GroupingSetsQueriesTest) asserting identical single-stage and multi-stage results, including the real-NULL vs rolled-up-NULL distinction and a DISTINCTCOUNTHLL sketch merge.🤖 Generated with Claude Code