branch-4.1: [fix](nereids) partition topn opt requires chosen window func partition key to be a subset of co-located ones#65073
Open
CalvinKirs wants to merge 1 commit into
Open
branch-4.1: [fix](nereids) partition topn opt requires chosen window func partition key to be a subset of co-located ones#65073CalvinKirs wants to merge 1 commit into
CalvinKirs wants to merge 1 commit into
Conversation
…on key to be a subset of co-located ones (#64764) When a single `LogicalWindow` holds multiple window functions, a filter like `rn <= k` may be converted into a `partitionTopN` and pushed **below the whole window node**. The generated `partitionTopN` keeps the per-partition top-k of the *chosen* window function, so it prunes the input rows that are **shared by all co-located window functions**. This is only correct when the chosen window function's partition key is a **subset** of every other co-located window function's partition key (i.e. the chosen one is *coarser*). Otherwise the pruning drops rows the other windows still need and produces a wrong result, e.g. ```sql -- independent partitions: chosen rn1(g1) prunes rows rn2(g2) needs row_number() over (partition by g1 order by ord_key) as rn1, -- chosen row_number() over (partition by g2 order by ord_key) as rn2 -- chosen is finer than a co-located window: chosen rn(c1,c2) prunes rows rk(c1) needs row_number() over (partition by c1, c2 order by c3) as rn, -- chosen (finer) rank() over (partition by c1 order by c3) as rk -- coarser ``` `LogicalWindow.getPushDownWindowFuncAndLimit()` already required the **order keys** of all co-located window functions to be compatible (#56622), but it never checked the **partition keys**. This PR additionally requires `windowFunc.getPartitionKeys().containsAll(chosenWindowFunc.getPartitionKeys())` for every co-located window function; otherwise the partition-topn optimization is disabled. The pruning keeps the per-`P0` (chosen partition) top-k by the order key. For another window function `W` partitioned by `P1`: - **`P0 ⊆ P1` (chosen coarser) → safe.** Any row that could change `W`'s value for a surviving row `r` is in the same `P1` partition with a smaller order value; being in the same `P1` it is also in the same `P0` partition with order `<= r`, so its `P0`-rank is within top-k and it is **kept**. Nothing `W` needs is pruned. - **`P0 ⊋ P1` or independent → unsafe.** A finer/independent `P0` can drop a row that ranks early in `P1`, corrupting `W`. So equality is unnecessarily strict; the precise safe condition is the **subset** relation, expressed with `containsAll` (which also makes the check order-insensitive in the partition-key list, matching the set semantics of `PARTITION BY`). Single window, multiple windows with the **same** partition key, and the **subset** case above all keep firing. Example where the chosen `rank(partition by g1)` is coarser than `row_number(partition by g1, g2)`, so a `VPartitionTopN(partition by g1)` is still generated: ```sql select id, g1, g2, ord_key, rk, rn from ( select id, g1, g2, ord_key, rank() over (partition by g1 order by ord_key) as rk, -- chosen (coarser) row_number() over (partition by g1, g2 order by ord_key) as rn from multi_window_cases ) q where rk <= 2; ``` ``` 6:VANALYTIC partition by: g1, g2 order by: ord_key <- computes rn 4:VANALYTIC partition by: g1 order by: ord_key <- computes rk 1:VPartitionTopN partition by: g1 order by: ord_key partition limit: 2 <- safe (g1 ⊆ {g1,g2}) 0:VOlapScanNode ``` ```sql CREATE TABLE multi_window_cases ( id INT, g1 VARCHAR(8), g2 VARCHAR(8), ord_key INT, amt INT ) DUPLICATE KEY(id) DISTRIBUTED BY HASH(id) BUCKETS 1 PROPERTIES ("replication_num" = "1"); INSERT INTO multi_window_cases VALUES (1,'A','X',1,10),(2,'A','X',2,20),(3,'A','Y',3,30),(4,'B','X',4,40), (5,'B','Y',5,50),(6,'B','Y',6,60),(7,'C','X',7,70),(8,'C','Z',8,80); SELECT id, g1, g2, ord_key, rn1, rn2 FROM ( SELECT id, g1, g2, ord_key, row_number() OVER (PARTITION BY g1 ORDER BY ord_key) AS rn1, row_number() OVER (PARTITION BY g2 ORDER BY ord_key) AS rn2 FROM multi_window_cases ) q WHERE rn1 <= 1 ORDER BY id; ``` Wrong result (before), `rn2` should be `1,3,4`: ``` +----+----+----+---------+-----+-----+ | id | g1 | g2 | ord_key | rn1 | rn2 | +----+----+----+---------+-----+-----+ | 1 | A | X | 1 | 1 | 1 | | 4 | B | X | 4 | 1 | 2 | <- wrong (should be 3) | 7 | C | X | 7 | 1 | 3 | <- wrong (should be 4) +----+----+----+---------+-----+-----+ ``` Correct result (after, matches MySQL 8.4): ``` +------+------+------+---------+-----+-----+ | id | g1 | g2 | ord_key | rn1 | rn2 | +------+------+------+---------+-----+-----+ | 1 | A | X | 1 | 1 | 1 | | 4 | B | X | 4 | 1 | 3 | | 7 | C | X | 7 | 1 | 4 | +------+------+------+---------+-----+-----+ ``` A `VPartitionTopN(partition by g1)` is inserted **below both analytic nodes**, so it prunes rows before `rn2` is computed: ``` 8:VSORT order by: id 7:VANALYTIC partition by: g2, order by: ord_key <- computes rn2 | predicates: (rn1 <= 1) 6:VSORT order by: g2, ord_key 4:VANALYTIC partition by: g1, order by: ord_key <- computes rn1 3:VSORT order by: g1, ord_key 1:VPartitionTopN partition by: g1, order by: ord_key <- prunes input (WRONG) 0:VOlapScanNode ``` No `VPartitionTopN`; both window functions are computed over the full input and `rn1 <= 1` stays as an ordinary predicate above them: ``` 7:VSORT order by: id 6:VANALYTIC partition by: g2, order by: ord_key <- computes rn2 (full input) | predicates: (rn1 <= 1) 5:VSORT order by: g2, ord_key 3:VANALYTIC partition by: g1, order by: ord_key <- computes rn1 2:VSORT order by: g1, ord_key 0:VOlapScanNode ``` Fix wrong result of multiple window functions (`row_number`/`rank`/`dense_rank`) with incompatible partition keys when a top-n filter (e.g. `rn <= k`) is applied; the partition-topn pushdown is now restricted to the cases where it is provably safe (the chosen function's partition key is a subset of the co-located ones). - [x] Test - [x] Regression test (`regression-test/suites/query_p0/partition_topn/check_partitionkey.groovy`, `.../push_down_filter_through_window/push_down_multi_filter_through_window.groovy`) - [x] Unit test (`GeneratePartitionTopnFromWindowTest`: `testMultipleWindowsWithDifferentPartitions`, `testMultipleWindowsSubsetPartitionGeneratesTopn`) - [x] Behavior changed: - [x] Function behavior changed (returns correct results for the cases above)
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Member
Author
|
run buildall |
Contributor
FE UT Coverage ReportIncrement line coverage |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
cherry-pick #64764