[SPARK-46367][SQL] Fix KeyedPartitioning not remapped through column aliases in ProjectExec by naveenp2708 · Pull Request #55475 · apache/spark

naveenp2708 · 2026-04-22T06:27:06Z

What changes were proposed in this pull request?

Fix for SPARK-46367. When ProjectExec aliases a column (e.g. id AS new_id), KeyedPartitioning from outputPartitioning still references the old column's ExprId. EnsureRequirements cannot match ClusteredDistribution on the aliased column and inserts an unnecessary Exchange shuffle.

This fix adds direct ExprId-based remapping of KeyedPartitioning expressions through column aliases in PartitioningPreservingUnaryExecNode. Two new helpers:

buildExprIdAliasMap: builds ExprId → Attribute map from alias entries
remapKeyedPartitioning: substitutes attributes in KeyedPartitioning expressions via the alias map, recursing into transform expressions

Non-aliased attributes absent from the output set cause the partitioning to be dropped, consistent with existing filter logic.

Why are the changes needed?

SPJ queries with column aliases followed by aggregation insert unnecessary shuffles, degrading performance. The bug has been present since Spark 3.5.0 and persists on current master after the KeyGroupedPartitioning → KeyedPartitioning refactor.

Does this PR introduce any user-facing change?

Yes. SPJ queries with column aliases will avoid unnecessary shuffles for downstream aggregations and dedup operations.

How was this patch tested?

Added reproduction test in KeyGroupedPartitioningSuite. All 211 related tests pass.

Was this patch authored or co-authored using generative AI tooling?

No.

…aliases in ProjectExec When ProjectExec aliases a column (e.g. id AS new_id), KeyedPartitioning from outputPartitioning still references the old column's ExprId. EnsureRequirements cannot match ClusteredDistribution on the aliased column and inserts an unnecessary Exchange shuffle. This fix adds direct ExprId-based remapping of KeyedPartitioning expressions through column aliases in PartitioningPreservingUnaryExecNode, preserving the partitionKeys and isGrouped fields while substituting attribute references.

naveenp2708 · 2026-04-22T06:28:28Z

@peter-toth @szehon-ho This is a fix for SPARK-46367, related to the KeyedPartitioning projection discussion on #54330. The bug has been open since Dec 2023 and confirmed it still reproduces on current master after the KeyedPartitioning refactor.

peter-toth · 2026-04-22T17:53:40Z

@naveenp2708 , this doesn't seem correct to me, the new test should pass without any code change.

Let me show you tomorrow what I was thinking about in #54330.

naveenp2708 · 2026-04-22T20:21:44Z

@peter-toth You're right.I re-ran the test without my fix on clean master and it passes. The existing projectExpression path already handles KeyedPartitioning correctly for this case. Apologies for the false alarm. Looking forward to seeing your approach for the broader projection discussion tomorrow. Thank you for the guidance.

peter-toth · 2026-04-23T18:52:16Z

@naveenp2708 , I opened a draft PR: #55519, but I still need to wrap it up.

naveenp2708 · 2026-04-24T17:46:39Z

@peter-toth Thank you for the comprehensive fix in #55519! The per-position projection with narrowing support and the isNarrowed flag for groupedSatisfies gating is a much more complete approach. Closing this in favor of yours. Happy to help review or test when ready.- any AI words here like it sound human

naveenp2708 closed this Apr 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-46367][SQL] Fix KeyedPartitioning not remapped through column aliases in ProjectExec#55475

[SPARK-46367][SQL] Fix KeyedPartitioning not remapped through column aliases in ProjectExec#55475
naveenp2708 wants to merge 1 commit intoapache:masterfrom
naveenp2708:spark-46367-fix-keyed-partitioning-alias

naveenp2708 commented Apr 22, 2026

Uh oh!

naveenp2708 commented Apr 22, 2026

Uh oh!

peter-toth commented Apr 22, 2026 •

edited

Loading

Uh oh!

naveenp2708 commented Apr 22, 2026

Uh oh!

peter-toth commented Apr 23, 2026

Uh oh!

naveenp2708 commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

naveenp2708 commented Apr 22, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

naveenp2708 commented Apr 22, 2026

Uh oh!

peter-toth commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

naveenp2708 commented Apr 22, 2026

Uh oh!

peter-toth commented Apr 23, 2026

Uh oh!

naveenp2708 commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

peter-toth commented Apr 22, 2026 •

edited

Loading