Skip to content

branch-4.1: [fix](nereids) partition topn opt requires chosen window func partition key to be a subset of co-located ones#65073

Open
CalvinKirs wants to merge 1 commit into
branch-4.1from
4.1-64764
Open

branch-4.1: [fix](nereids) partition topn opt requires chosen window func partition key to be a subset of co-located ones#65073
CalvinKirs wants to merge 1 commit into
branch-4.1from
4.1-64764

Conversation

@CalvinKirs

Copy link
Copy Markdown
Member

cherry-pick #64764

…on key to be a subset of co-located ones (#64764)

When a single `LogicalWindow` holds multiple window functions, a filter
like `rn <= k` may be converted into a `partitionTopN` and pushed
**below the whole window node**. The generated `partitionTopN` keeps the
per-partition top-k of the *chosen* window function, so it prunes the
input rows that are **shared by all co-located window functions**.

This is only correct when the chosen window function's partition key is
a **subset** of every other co-located window function's partition key
(i.e. the chosen one is *coarser*). Otherwise the pruning drops rows the
other windows still need and produces a wrong result, e.g.

```sql
-- independent partitions: chosen rn1(g1) prunes rows rn2(g2) needs
row_number() over (partition by g1 order by ord_key) as rn1,  -- chosen
row_number() over (partition by g2 order by ord_key) as rn2

-- chosen is finer than a co-located window: chosen rn(c1,c2) prunes rows rk(c1) needs
row_number() over (partition by c1, c2 order by c3) as rn,    -- chosen (finer)
rank()       over (partition by c1 order by c3) as rk         -- coarser
```

`LogicalWindow.getPushDownWindowFuncAndLimit()` already required the
**order keys** of all co-located window functions to be compatible
(#56622), but it never checked the **partition keys**. This PR
additionally requires
`windowFunc.getPartitionKeys().containsAll(chosenWindowFunc.getPartitionKeys())`
for every co-located window function; otherwise the partition-topn
optimization is disabled.

The pruning keeps the per-`P0` (chosen partition) top-k by the order
key. For another window function `W` partitioned by `P1`:

- **`P0 ⊆ P1` (chosen coarser) → safe.** Any row that could change `W`'s
value for a surviving row `r` is in the same `P1` partition with a
smaller order value; being in the same `P1` it is also in the same `P0`
partition with order `<= r`, so its `P0`-rank is within top-k and it is
**kept**. Nothing `W` needs is pruned.
- **`P0 ⊋ P1` or independent → unsafe.** A finer/independent `P0` can
drop a row that ranks early in `P1`, corrupting `W`.

So equality is unnecessarily strict; the precise safe condition is the
**subset** relation, expressed with `containsAll` (which also makes the
check order-insensitive in the partition-key list, matching the set
semantics of `PARTITION BY`).

Single window, multiple windows with the **same** partition key, and the
**subset** case above all keep firing. Example where the chosen
`rank(partition by g1)` is coarser than `row_number(partition by g1,
g2)`, so a `VPartitionTopN(partition by g1)` is still generated:

```sql
select id, g1, g2, ord_key, rk, rn
from (
  select id, g1, g2, ord_key,
    rank()       over (partition by g1 order by ord_key) as rk,   -- chosen (coarser)
    row_number() over (partition by g1, g2 order by ord_key) as rn
  from multi_window_cases
) q
where rk <= 2;
```

```
  6:VANALYTIC          partition by: g1, g2   order by: ord_key   <- computes rn
  4:VANALYTIC          partition by: g1       order by: ord_key   <- computes rk
  1:VPartitionTopN     partition by: g1       order by: ord_key   partition limit: 2   <- safe (g1 ⊆ {g1,g2})
  0:VOlapScanNode
```

```sql
CREATE TABLE multi_window_cases (
  id      INT,
  g1      VARCHAR(8),
  g2      VARCHAR(8),
  ord_key INT,
  amt     INT
)
DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES ("replication_num" = "1");

INSERT INTO multi_window_cases VALUES
(1,'A','X',1,10),(2,'A','X',2,20),(3,'A','Y',3,30),(4,'B','X',4,40),
(5,'B','Y',5,50),(6,'B','Y',6,60),(7,'C','X',7,70),(8,'C','Z',8,80);

SELECT id, g1, g2, ord_key, rn1, rn2
FROM (
  SELECT id, g1, g2, ord_key,
    row_number() OVER (PARTITION BY g1 ORDER BY ord_key) AS rn1,
    row_number() OVER (PARTITION BY g2 ORDER BY ord_key) AS rn2
  FROM multi_window_cases
) q
WHERE rn1 <= 1
ORDER BY id;
```

Wrong result (before), `rn2` should be `1,3,4`:

```
+----+----+----+---------+-----+-----+
| id | g1 | g2 | ord_key | rn1 | rn2 |
+----+----+----+---------+-----+-----+
|  1 | A  | X  |       1 |   1 |   1 |
|  4 | B  | X  |       4 |   1 |   2 |   <- wrong (should be 3)
|  7 | C  | X  |       7 |   1 |   3 |   <- wrong (should be 4)
+----+----+----+---------+-----+-----+
```

Correct result (after, matches MySQL 8.4):

```
+------+------+------+---------+-----+-----+
| id   | g1   | g2   | ord_key | rn1 | rn2 |
+------+------+------+---------+-----+-----+
|    1 | A    | X    |       1 |   1 |   1 |
|    4 | B    | X    |       4 |   1 |   3 |
|    7 | C    | X    |       7 |   1 |   4 |
+------+------+------+---------+-----+-----+
```

A `VPartitionTopN(partition by g1)` is inserted **below both analytic
nodes**, so it prunes rows before `rn2` is computed:

```
  8:VSORT              order by: id
  7:VANALYTIC          partition by: g2,  order by: ord_key   <- computes rn2
  |  predicates: (rn1 <= 1)
  6:VSORT              order by: g2, ord_key
  4:VANALYTIC          partition by: g1,  order by: ord_key   <- computes rn1
  3:VSORT              order by: g1, ord_key
  1:VPartitionTopN     partition by: g1,  order by: ord_key   <- prunes input (WRONG)
  0:VOlapScanNode
```

No `VPartitionTopN`; both window functions are computed over the full
input and `rn1 <= 1` stays as an ordinary predicate above them:

```
  7:VSORT              order by: id
  6:VANALYTIC          partition by: g2,  order by: ord_key   <- computes rn2 (full input)
  |  predicates: (rn1 <= 1)
  5:VSORT              order by: g2, ord_key
  3:VANALYTIC          partition by: g1,  order by: ord_key   <- computes rn1
  2:VSORT              order by: g1, ord_key
  0:VOlapScanNode
```

Fix wrong result of multiple window functions
(`row_number`/`rank`/`dense_rank`) with incompatible partition keys when
a top-n filter (e.g. `rn <= k`) is applied; the partition-topn pushdown
is now restricted to the cases where it is provably safe (the chosen
function's partition key is a subset of the co-located ones).

- [x] Test
- [x] Regression test
(`regression-test/suites/query_p0/partition_topn/check_partitionkey.groovy`,
`.../push_down_filter_through_window/push_down_multi_filter_through_window.groovy`)
- [x] Unit test (`GeneratePartitionTopnFromWindowTest`:
`testMultipleWindowsWithDifferentPartitions`,
`testMultipleWindowsSubsetPartitionGeneratesTopn`)

- [x] Behavior changed:
- [x] Function behavior changed (returns correct results for the cases
above)
@CalvinKirs CalvinKirs requested a review from yiguolei as a code owner July 1, 2026 06:17
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@CalvinKirs

Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (2/2) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants