Skip to content

fix: handle complex projections in ordering validation#20362

Draft
adriangb wants to merge 1 commit intoapache:mainfrom
pydantic:fix-complex-projection-ordering
Draft

fix: handle complex projections in ordering validation#20362
adriangb wants to merge 1 commit intoapache:mainfrom
pydantic:fix-complex-projection-ordering

Conversation

@adriangb
Copy link
Contributor

Summary

  • Replace ordered_column_indices_from_projection with resolve_sort_column_projection which only requires sort-column positions to resolve to simple Column expressions, rather than failing the entire projection if any expression is complex
  • Evaluate each ordering independently in get_projected_output_ordering: orderings on simple column refs get validated with min/max statistics even when other projection expressions are complex (e.g. a + 1)
  • For orderings where a sort column is itself a complex expression, fall back to the single-file-group check

Problem: After projection pushdown, complex expressions in ProjectionExprs are common (e.g. SELECT a + 1 AS x, b, c FROM t ORDER BY b). The old ordered_column_indices_from_projection was all-or-nothing: it failed on BinaryExpr(a+1) at index 0 and returned None for the entire projection, even though the ordering on b (index 1) maps cleanly to a simple Column. With multi-file groups, this caused valid orderings to be unnecessarily dropped.

Test plan

  • cargo test -p datafusion-datasource (97 tests pass)
  • cargo test -p datafusion-sqllogictest --test sqllogictests -- parquet_sorted_statistics (passes)

🤖 Generated with Claude Code

Previously, `get_projected_output_ordering` used
`ordered_column_indices_from_projection` which was all-or-nothing: if any
expression in the projection wasn't a simple Column, it returned None for
the entire projection — even if the sort columns themselves were simple
column refs.

Replace it with `resolve_sort_column_projection` which only requires
sort-column positions to resolve to simple Columns. Each ordering is now
independently evaluated: orderings on simple column refs get validated
with statistics even when other projection expressions are complex.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added the datasource Changes to the datasource crate label Feb 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant