Skip to content

Conversation

@VaggelisD
Copy link
Collaborator

@VaggelisD VaggelisD commented Jan 23, 2026

Fixes #6836

Hive hierarchy supports LATERAL VIEW POSEXPLODE i.e an enumerated EXPLODE with the following semantics:

  • Column aliases are required (query errors without them)
  • The pos column is 0-based and precedes the exploded values
  • The number of exploded values may differ depending on the ARRAY vs MAP input

An example:

# Array based: LATERAL appends result set [pos, col]
spark-sql (default)> WITH x AS (SELECT 'a,b,c' AS col) 
SELECT * FROM x LATERAL VIEW POSEXPLODE(SPLIT(col, ',')) t AS pos, col;

col     pos     col
a,b,c   0       a
a,b,c   1       b
a,b,c   2       c

# Map based: LATERAL appends result set [pos, key, value]
spark-sql (default)> with x as (select 'key' as col) 
SELECT * FROM x LATERAL VIEW POSEXPLODE(MAP(col, 'val', 'foo', 'val2')) AS pos, key, value;

col     pos     key     value
key     0       key     val
key     1       foo     val2

Currently, our transpilation towards Presto/Trino & DuckDB is broken because:

  1. The pos is reversed and
  2. The WITH ORDINALITY clause is 1-based:
# (Before) Transpiled DuckDB 
duckdb> WITH x AS (SELECT 'a,b,c' AS col) 
SELECT * FROM x CROSS JOIN UNNEST(STR_SPLIT_REGEX(col, ',')) WITH ORDINALITY AS t(pos, col);

┌─────────┬─────────┬───────┐
│   col   │   pos   │  col  │
│ varcharvarchar │ int64 │
├─────────┼─────────┼───────┤
│ a,b,c   │ c       │     3 │
│ a,b,c   │ b       │     2 │
│ a,b,c   │ a       │     1 │
└─────────┴─────────┴───────┘



# (Before) Transpiled Presto/Trino:
trino> WITH x AS (SELECT 'a,b,c' AS col) 
SELECT * FROM x CROSS JOIN UNNEST(REGEXP_SPLIT(col, ',')) WITH ORDINALITY AS t(pos, col);

  col  | pos | col
-------+-----+-----
 a,b,c | a   |   1
 a,b,c | b   |   2
 a,b,c | c   |   3

This PR fixes the transpilation by (1) reversing the columns to match the pos position and (2) subtracting 1 from the WITH ORDINALITY column:

# (After) Transpiled DuckDB 
duckdb> WITH x AS (SELECT 'a,b,c' AS col) 
SELECT * FROM x CROSS JOIN LATERAL (SELECT pos - 1 AS pos, col FROM UNNEST(STR_SPLIT_REGEX(col, ',')) WITH ORDINALITY AS t(col, pos));

┌─────────┬───────┬─────────┐
│   col   │  pos  │   col   │
│ varchar │ int64 │ varchar │
├─────────┼───────┼─────────┤
│ a,b,c   │     2 │ c       │
│ a,b,c   │     1 │ b       │
│ a,b,c   │     0 │ a       │
└─────────┴───────┴─────────┘



# (After) Transpiled Presto/Trino:
trino> WITH x AS (SELECT 'a,b,c' AS col) 
SELECT * FROM x CROSS JOIN LATERAL (SELECT pos - 1 AS pos, col FROM UNNEST(REGEXP_SPLIT(col, ',')) WITH ORDINALITY AS t(col, pos));

  col  | pos | col
-------+-----+-----
 a,b,c |   0 | a
 a,b,c |   1 | b
 a,b,c |   2 | c

Docs

Spark / DBX POSEXPLODE | Spark / DBX LATERAL VIEW | DuckDB WITH ORDINALITY | Presto/Trino WITH ORDINALITY

@github-actions
Copy link
Contributor

SQLGlot Integration Test Results

Comparing:

  • this branch (sqlglot:vaggelisd/explode_to_unnest, sqlglot version: vaggelisd/explode_to_unnest)
  • baseline (main, sqlglot version: 28.6.1.dev82)

⚠️ Limited to dialects: duckdb, snowflake, bigquery

By Dialect

dialect main sqlglot:vaggelisd/explode_to_unnest difference links
bigquery -> bigquery 2592/2624 passed (98.8%) 2592/2624 passed (98.8%) No change full result / delta
bigquery -> duckdb 1847/2623 passed (70.4%) 1846/2623 passed (70.4%) No change full result / delta
duckdb -> duckdb 4003/4003 passed (100.0%) 4003/4003 passed (100.0%) No change full result / delta
snowflake -> duckdb 615/847 passed (72.6%) 615/847 passed (72.6%) No change full result / delta
snowflake -> snowflake 847/847 passed (100.0%) 847/847 passed (100.0%) No change full result / delta

Overall

main: 10944 total, 9904 passed (pass rate: 90.5%), sqlglot version: 28.6.1.dev82

sqlglot:vaggelisd/explode_to_unnest: 10944 total, 9903 passed (pass rate: 90.5%), sqlglot version: vaggelisd/explode_to_unnest

Difference: No change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

POSEXPLODE transpiled incorrectly from spark to duckdb

3 participants