Snowflake Unparser dialect and UNNEST support#21593
Snowflake Unparser dialect and UNNEST support#21593yonatan-sevenai wants to merge 20 commits intoapache:mainfrom
Conversation
…gregate When the SQL unparser encountered a SubqueryAlias node whose direct child was an Aggregate (or other clause-building plan like Window, Sort, Limit, Union), it would flatten the subquery into a simple table alias, losing the aggregate entirely. For example, a plan representing: SELECT j1.col FROM j1 JOIN (SELECT max(id) AS m FROM j2) AS b ON j1.id = b.m would unparse to: SELECT j1.col FROM j1 INNER JOIN j2 AS b ON j1.id = b.m dropping the MAX aggregate and the subquery. Root cause: the SubqueryAlias handler in select_to_sql_recursively would call subquery_alias_inner_query_and_columns (which only unwraps Projection children) and unparse_table_scan_pushdown (which only handles TableScan/SubqueryAlias/Projection). When both returned nothing useful for an Aggregate child, the code recursed directly into the Aggregate, merging its GROUP BY into the outer SELECT instead of wrapping it in a derived subquery. The fix adds an early check: if the SubqueryAlias's direct child is a plan type that builds its own SELECT clauses (Aggregate, Window, Sort, Limit, Union), emit it as a derived subquery via self.derive() with the alias always attached, rather than falling through to the recursive path that would flatten it.
…gregate When the SQL unparser encountered a SubqueryAlias node whose direct child was an Aggregate (or other clause-building plan like Window, Sort, Limit, Union), it would flatten the subquery into a simple table alias, losing the aggregate entirely. For example, a plan representing: SELECT j1.col FROM j1 JOIN (SELECT max(id) AS m FROM j2) AS b ON j1.id = b.m would unparse to: SELECT j1.col FROM j1 INNER JOIN j2 AS b ON j1.id = b.m dropping the MAX aggregate and the subquery. Root cause: the SubqueryAlias handler in select_to_sql_recursively would call subquery_alias_inner_query_and_columns (which only unwraps Projection children) and unparse_table_scan_pushdown (which only handles TableScan/SubqueryAlias/Projection). When both returned nothing useful for an Aggregate child, the code recursed directly into the Aggregate, merging its GROUP BY into the outer SELECT instead of wrapping it in a derived subquery. The fix adds an early check: if the SubqueryAlias's direct child is a plan type that builds its own SELECT clauses (Aggregate, Window, Sort, Limit, Union), emit it as a derived subquery via self.derive() with the alias always attached, rather than falling through to the recursive path that would flatten it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The existing tests pass with broken SQL output — the SELECT list still uses DataFusion internal names (__unnest_placeholder) instead of Snowflake's alias.VALUE convention. Update expectations to the correct Snowflake SQL so these tests will drive the implementation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nd Projection When a table is accessed through a passthrough/virtual table mapping, DataFusion inserts a SubqueryAlias node between Unnest and its inner Projection. The FLATTEN rendering code assumed a direct Projection child and failed with "Unnest input is not a Projection: SubqueryAlias(...)". Peel through SubqueryAlias in three code paths that inspect unnest.input: try_unnest_to_lateral_flatten_sql, the inline-vs-table source check, and the general unnest recursion. Also fix a pre-existing collapsible_if clippy warning in check_unnest_placeholder_with_outer_ref. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Thanks @yonatan-sevenai. I think there is another PR that adds support for the Snowflake dialect from @goldmedal (#20648). Maybe you could collaborate together on one of the PRs to avoid duplicate work. |
Thanks! I believe the implementation I added covers many more use cases, but we'll see if we can collaborate on a single implementation. Hope we can figure out a single stable implementation! |
|
Thanks @yonatan-sevenai, I'll take a look at this PR |
goldmedal
left a comment
There was a problem hiding this comment.
@yonatan-sevenai Thanks for picking this up — I haven't had bandwidth to finish #20648 recently, but I'd like to help get this landed.
Before a detailed review, I want to discuss the design. LATERAL FLATTEN(INPUT => expr) is Snowflake-specific syntax, and this PR embeds that logic directly in the core unparser. I'd prefer delegating UNNEST-to-table-factor conversion to the Dialect trait — see my detailed comment below.
What do you think?
96642c7 to
70c717c
Compare
Replace the hardcoded FLATTEN_DEFAULT_ALIAS ("_unnest") with a
per-SelectBuilder counter that generates unique aliases (_unnest_1,
_unnest_2, …). This prevents alias collisions when multiple unnests
appear in the same query.
- Add flatten_alias_counter to SelectBuilder with next/current
accessor methods, scoped to one SELECT so subqueries get
independent counters
- Remove FLATTEN_DEFAULT_ALIAS constant, the dead alias_name()
method, and the default alias from FlattenRelationBuilder
- All three FLATTEN code paths (placeholder projection, display-name
projection, and Unnest handler) now coordinate through the
SelectBuilder to ensure SELECT items and FROM clause use the same
alias
- Use internal_datafusion_err! macro for FLATTEN error handling
- Migrate unnest tests from partial .contains() assertions to
insta::assert_snapshot! for full SQL verification
70c717c to
8e83e17
Compare
goldmedal
left a comment
There was a problem hiding this comment.
Thanks @yonatan-sevenai, Overall looks good to me 👍 Just some minor suggestions.
| /// (`_unnest_1`, `_unnest_2`, …). Each call returns a fresh name. | ||
| pub fn next_flatten_alias(&mut self) -> String { | ||
| self.flatten_alias_counter += 1; | ||
| format!("_unnest_{}", self.flatten_alias_counter) |
There was a problem hiding this comment.
It's better to use a constant to present the name. Something like
pub const UNNEST_PREFIX: &str = "__unnest_";
There was a problem hiding this comment.
Could you add a test for the multiple unnest case?
Which issue does this PR close?
Rationale for this change
The SQL unparser needs a Snowflake dialect. Basic dialect settings (identifier quoting,
NULLS FIRST/NULLS LAST, timestamp types) are straightforward, butUNNESTsupport required more than configuration.Snowflake has no
UNNESTkeyword. Its equivalent,LATERAL FLATTEN(INPUT => expr), is a table function in theFROMclause with output accessed viaalias."VALUE". This differs structurally from standard SQL: the unparser must emit aFROM-clause table factor with aCROSS JOINinstead of aSELECT-clause expression. It also must rewrite column references to point at the FLATTEN output, and handle several optimizer-produced plan shapes (intermediateLimit/Sortnodes,SubqueryAliaswrappers, composed expressions wrapping the unnest output, multi-expression projections). None of this can be expressed throughCustomDialectBuilder.What changes are included in this PR?
dialect.rs- NewSnowflakeDialectwith double-quote identifiers,NULLS FIRST/NULLS LAST, no empty select lists, no column aliases in table aliases, Snowflake timestamp types, andunnest_as_lateral_flatten(). Also wired intoCustomDialect/CustomDialectBuilder.ast.rs- NewFlattenRelationBuilderthat producesLATERAL FLATTEN(INPUT => expr, OUTER => bool)table factors, parallel to the existingUnnestRelationBuilder.utils.rs- Newunproject_unnest_expr_as_flatten_valuetransform that rewrites unnest placeholder columns to_unnest.VALUEreferences.plan.rs- Changes toselect_to_sql_recursively:Projectionhandler scans all expressions for unnest placeholders (not just single-expression projections), then branches into the FLATTEN path or the existing table-factor path.peel_to_unnest_with_modifierswalks throughLimit/Sortnodes betweenProjectionandUnnest, applying their SQL modifiers to the query builder. This handles an optimizer behavior where these nodes are inserted between the two.peel_to_inner_projectionwalks throughSubqueryAliasto find the innerProjectionthat feeds anUnnest.reconstruct_select_statementgained FLATTEN-aware expression rewriting and ahas_internal_unnest_aliaspredicate to strip internalUNNEST(...)display names.Unnesthandler rejects struct columns for the FLATTEN dialect with a clear error.Are these changes tested?
Yes. 18 new tests covering:
FROM(UNNEST in SELECT clause)LimitbetweenProjectionandUnnestSortbetweenProjectionandUnnestLimit+SubqueryAliascombinedCAST)LimitLimitSubqueryAliasbetweenUnnestand innerProjectionAre there any user-facing changes?
Yes. New public API surface:
SnowflakeDialectstruct and its constructorDialect::unnest_as_lateral_flatten()method (defaultfalse)CustomDialectBuilder::with_unnest_as_lateral_flatten()FlattenRelationBuilderandFLATTEN_DEFAULT_ALIASin the AST moduleNone of these are breaking changes, and all previous APIs should work.
New traits have default implementations to ease migrations.