Skip to content

branch-4.0: [fix](agg) Fix incorrect aggregate merge with duplicate aliases #65025#65056

Open
github-actions[bot] wants to merge 1 commit into
branch-4.0from
auto-pick-65025-branch-4.0
Open

branch-4.0: [fix](agg) Fix incorrect aggregate merge with duplicate aliases #65025#65056
github-actions[bot] wants to merge 1 commit into
branch-4.0from
auto-pick-65025-branch-4.0

Conversation

@github-actions

@github-actions github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Cherry-picked from #65025

Related PR: #31811

Problem Summary:

A nested aggregate query could return incorrect results when multiple
outer group-by aliases referenced the same inner grouping expression.

For example, after resolving aliases, the inner grouping keys were `(a,
b)` while the outer grouping keys became `(a, a)`. `MergeAggregate`
determined whether the groupings were identical by comparing their list
sizes. Since both lists contained two elements, it incorrectly
considered them equivalent.

This allowed `SUM(COUNT(DISTINCT c))` to be merged into `COUNT(DISTINCT
c)` while removing grouping key `b`. Values repeated across different
`b` groups were consequently counted only once, producing an undercount.

This change compares the unique grouping-expression sets after
projection replacement. Aggregate layers containing `DISTINCT` are now
merged only when their grouping semantics are actually identical.

### Release note

Fix incorrect query results for nested aggregates with duplicate
group-by aliases and DISTINCT aggregate functions.
@github-actions github-actions Bot requested a review from morningman as a code owner July 1, 2026 02:11
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hello-stephen

Copy link
Copy Markdown
Contributor

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 87.50% (7/8) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants