Skip to content

feat(exasol): auto-alias CTE projections and transpile JSON_OBJECT#7539

Closed
mkcorneli wants to merge 1 commit intotobymao:mainfrom
mkcorneli:feat/exasol-output-correctness
Closed

feat(exasol): auto-alias CTE projections and transpile JSON_OBJECT#7539
mkcorneli wants to merge 1 commit intotobymao:mainfrom
mkcorneli:feat/exasol-output-correctness

Conversation

@mkcorneli
Copy link
Copy Markdown
Contributor

Summary

Three orthogonal Exasol output-correctness fixes bundled together (all touch the Exasol generator):

  1. CTE literal auto-aliasing — Exasol rejects unaliased expressions inside CTE SELECT lists; this PR synthesizes _col_0, _col_1, ... aliases for them.
  2. JSON_OBJECT → CONCAT with NULL handling — Exasol has no native JSON_OBJECT; the base generator was emitting invalid colon syntax. This PR produces a CONCAT expression with per-type NULL handling.
  3. GROUP BY / HAVING alias test coverage — the existing _add_local_prefix_for_aliases preprocessor already handles MySQL → Exasol alias references correctly; this PR adds regression test coverage so the behavior is pinned.

Before / After

CTE unaliased literals

-- Before (Exasol rejects):
>>> sqlglot.transpile("WITH cte AS (SELECT 12345, 'val') SELECT * FROM cte", read="mysql", write="exasol")[0]
"WITH cte AS (SELECT 12345, 'val') SELECT * FROM cte"
-- Exasol: "must name expression in query cte with a column alias"

-- After:
'WITH cte AS (SELECT 12345 AS "_col_0", \'val\' AS "_col_1") SELECT * FROM cte'

JSON_OBJECT

-- Before (Exasol rejects `:`):
>>> sqlglot.transpile("SELECT JSON_OBJECT('k', v)", read="mysql", write="exasol")[0]
"SELECT JSON_OBJECT('k': v)"
-- Exasol: "syntax error, unexpected ':'"

-- After (type-aware NULL handling via CONCAT/COALESCE/CASE WHEN):
"SELECT '{' || '\"k\": ' || COALESCE(CAST(v AS VARCHAR(100)), 'null') || '}'"

GROUP BY alias

-- Before (Exasol rejects bare alias in GROUP BY):
>>> sqlglot.transpile("SELECT city, COUNT(*) AS cnt FROM t GROUP BY cnt", read="mysql", write="exasol")[0]
'SELECT city, COUNT(*) AS cnt FROM t GROUP BY LOCAL.cnt'   -- already working, now pinned

Implementation notes

  • CTE preprocessor skips stars. _add_cte_column_aliases leaves exp.Alias, exp.Column, exp.Star, and any projection containing a Star (e.g. t.*) untouched. Wrapping a star in an alias would produce invalid SQL like SELECT * AS \"_col_0\". Regression tests cover SELECT *, SELECT t.*, and nested CTEs.

  • jsonobject_sql uses expression builders. No f-string SQL construction. Keys get their \" characters escaped before being emitted as string literals. Value branches:

    • Literal strings → CONCAT('\"', value, '\"')
    • String-typed columns (via is_type(*exp.DataType.TEXT_TYPES), annotate-on-demand) → CASE WHEN v IS NULL THEN 'null' ELSE CONCAT('\"', v, '\"') END
    • Numeric/date/other → COALESCE(CAST(v AS VARCHAR(100)), 'null')
    • Empty args → '{}'
  • exp.Concat|| chain. Exasol has CONCAT_COALESCE = True, so the builder renders || chains rather than CONCAT(...). Functionally equivalent.

  • TRANSFORMS override needed for exp.JSONObject. The base Generator.TRANSFORMS already maps exp.JSONObject_jsonobject_sql, which shadows the auto-discovered jsonobject_sql method. A one-line TRANSFORMS entry routes through the new method.

Test plan

  • New test_cte_literal_auto_alias covers 8 scenarios (literals, mixed, existing alias, function calls/arithmetic, bare columns, bare *, qualified t.*, nested CTE, non-CTE subquery)
  • New test_json_object covers empty args, string-typed column (CASE WHEN), numeric-typed column (COALESCE/CAST), multi-pair commas
  • New test_group_by_alias_local covers bare alias, expression alias, non-alias column unchanged, HAVING alias
  • Full tests/dialects/ suite runs cleanly (no new failures vs. main)
  • ruff check + ruff format --check pass

Three orthogonal Exasol output-correctness fixes:

* Auto-inject synthetic column aliases (_col_0, _col_1, ...) for
  unaliased non-column projections inside CTE SELECT lists. Exasol
  rejects unaliased expressions in CTEs with
  "must name expression in query <cte> with a column alias".
  Bare column references and stars (including t.*) are left alone so
  the rewrite does not produce invalid SQL like `SELECT * AS "_col_0"`.

* Add `jsonobject_sql` to ExasolGenerator so `JSON_OBJECT(k, v, ...)`
  is transpiled to a CONCAT expression with per-value-type NULL
  handling. String-typed columns use
  `CASE WHEN v IS NULL THEN 'null' ELSE CONCAT('"', v, '"') END`;
  numeric/date/other types use
  `COALESCE(CAST(v AS VARCHAR(100)), 'null')`. Empty arg list emits
  the literal `'{}'`. Previously the base generator emitted
  `JSON_OBJECT('k': v)` with colon syntax, which Exasol rejects.

* Add test coverage for the existing `_add_local_prefix_for_aliases`
  preprocessor when transpiling GROUP BY / HAVING alias references
  from other dialects (e.g. `GROUP BY cnt` from MySQL becomes
  `GROUP BY LOCAL.cnt` on Exasol).
@georgesittas
Copy link
Copy Markdown
Collaborator

Hey @mkcorneli thanks for the PRs. Could you please split these across three different PRs, given that they're orthogonal? Reviewing will be easier in that way.

@georgesittas
Copy link
Copy Markdown
Collaborator

Closing this one to review the changes separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants