Skip to content

sql: post repr types: Rename output_type() to output_sql_type() on all scalar funcs #35219

Draft
mgree wants to merge 41 commits intoMaterializeInc:mainfrom
mgree:repr-type-canonicalize-xform-rename-output-type
Draft

sql: post repr types: Rename output_type() to output_sql_type() on all scalar funcs #35219
mgree wants to merge 41 commits intoMaterializeInc:mainfrom
mgree:repr-type-canonicalize-xform-rename-output-type

Conversation

@mgree
Copy link
Contributor

@mgree mgree commented Feb 25, 2026

Motivation

Cleanup stacked on #35084.

Description

The various scalar functions compute their SQL output types given SQL input types. This renames functions to clarify and adds a shim that converts repr types up to canonical SQL types and then converts the resulting SQL type back to a repr type.

Verification

All the tests.

mgree and others added 30 commits February 25, 2026 09:15
…f trusting `Get`s, which now have canonicalized types
These fire when repr-type canonicalization does not fully prevent type
mismatches, providing CI visibility without crashing production.

- scalar.rs: base_eq_or_repr_eq_for_assertion panic -> soft_panic_or_log
- context.rs: arrangement key fallback panic -> soft_panic_or_log
- relation.rs: try_union trace -> soft_panic_or_log
This tweaks `MirScalarExpr::typ` function case and
`try_col_with_input_cols`'s TableFunc and AggregateFunc cases,
to prevent function return types from reintroducing non-canonical
types, e.g. CastOidToRegProc or AclExplode.
…text

Add a ReduceContext enum to MirScalarExpr::reduce to distinguish between SQL-level callers (which need exact types like VarChar, RegProc) and optimizer callers (which should work with repr-canonical types).

When context is ReduceContext::Optimizer, column types are canonicalized via repr round-trip at the start of reduce(). This ensures consistency after ReprizeSqlTypes has run.

Note: The ReduceContext approach introduced in this commit has been superseded by the reduce_repr() approach in a later commit. The ReduceContext enum and its usage have been removed.
Add infrastructure:
- SqlRelationType::from_repr(&ReprRelationType) in src/repr/src/relation.rs
- MirRelationExpr::repr_typ() -> ReprRelationType in src/expr/src/relation.rs
- MirRelationExpr::repr_typ_with_input_types(&[ReprRelationType]) in src/expr/src/relation.rs

Switch the following transform files to work with ReprRelationType/ReprColumnType:
- column_knowledge.rs: optimize() now takes &[ReprColumnType], all .typ() sites
  produce Vec<ReprColumnType>, scalar.repr_typ() for output types.
- predicate_pushdown.rs: repr_typ() for nullable checks, extract_equal_or_both_null
  and helpers take &[ReprColumnType], uses repr_typ()/repr_typ_with_input_types
  throughout.
- fusion/filter.rs: uses input.repr_typ().column_types directly.
- join_implementation.rs: input_types is Vec<ReprRelationType> via repr_typ(),
  uses new_from_input_repr_types.
- redundant_join.rs: same pattern as join_implementation.rs.
- literal_constraints.rs: inp_typ switched to ReprRelationType, converts back to
  SqlRelationType only for MIR node construction (Constant, take_safely).

Convert back to SqlColumnType/SqlRelationType only where structurally required
(reduce() calls, MIR node construction).
Add take_safely_repr(Option<ReprRelationType>) and
take_safely_with_repr_col_types(Vec<ReprColumnType>) to
MirRelationExpr. These are thin wrappers that convert to SQL
types internally, providing a cleaner API for optimizer
transforms that work with repr types natively.

Update 4 call sites:
- equivalence_propagation.rs (2): eliminate
  SqlColumnType::from_repr conversions
- predicate_pushdown.rs: eliminate SqlRelationType::from_repr
- literal_constraints.rs: eliminate SqlRelationType::from_repr
Change the signature of on_unique and its helpers
(on_unique_ranking_window_funcs, on_unique_window_agg) from
&[SqlColumnType] to &[ReprColumnType]. Internally, convert to
SqlColumnType once at the top for self.typ() calls that need
SQL types. This eliminates the SqlColumnType::from_repr
conversion in reduce_elision.rs, which was the only external
caller.
Do the following easy switches from reduce(&[SqlColumnType]) to reduce_repr(&[ReprColumnType]):

- simplify_to_literal / simplify_to_literal_with_result: It's ok to do
  the constant folding in MIR, because the result is untyped.

- canonicalize.rs: 3 calls in canonicalize_equivalences, canonicalize_predicates,
  and replace_subexpr_and_reduce. These already had ReprColumnType available and
  were converting to SqlColumnType just to call reduce. The conversion is now
  removed entirely.

- plan_index_exprs in query.rs: Index key expressions should have repr types so
  optimizer code can find matching indexes.

- WebhookValidation::reduce_expression in plan.rs: Webhook validation
  expressions are used at runtime where repr types are the native currency.
Several transform files were storing SqlRelationType obtained via
typ() (which converts stored ReprRelationType to SqlRelationType),
only to convert back to ReprRelationType later. Switch these to
use repr_typ() and store ReprRelationType directly, eliminating
unnecessary round-trips.

Files changed: projection_lifting.rs, projection_pushdown.rs, cse/anf.rs, union_cancel.rs, demand.rs, dataflow.rs.
ggevay and others added 6 commits February 25, 2026 18:27
Switch Scope struct in expr-test-util to use ReprRelationType instead of
SqlRelationType. Update all insert/set/get/iter signatures accordingly.
build_get no longer needs conversion, build_let uses repr_typ(), and
reverse_syntax_override stores ReprRelationType directly.

Update doc comments in fusion/filter.rs and predicate_pushdown.rs to
construct MirRelationExpr::Constant directly with ReprRelationType
instead of using MirRelationExpr::constant() with SqlRelationType.
@github-actions
Copy link

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

  • Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
  • Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
  • Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants