Preserve original expression text as column name in unified SQL#5392
Preserve original expression text as column name in unified SQL#5392dai-chen wants to merge 1 commit intoopensearch-project:mainfrom
Conversation
PR Reviewer Guide 🔍(Review updated until commit b16ae8c)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to b16ae8c Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit 2166410
Suggestions up to commit 847d018
Suggestions up to commit e48574a
Suggestions up to commit 30c8898
Suggestions up to commit b856cd7
|
b856cd7 to
30c8898
Compare
|
Persistent review updated to latest commit 30c8898 |
30c8898 to
e48574a
Compare
|
Persistent review updated to latest commit e48574a |
e48574a to
847d018
Compare
|
Persistent review updated to latest commit 847d018 |
847d018 to
2166410
Compare
|
Persistent review updated to latest commit 2166410 |
Calcite's SqlToRelConverter names unnamed SELECT items EXPR$0, EXPR$1, etc. - surprising versus PostgreSQL/MySQL/Spark. Fix by adding SelectItemAliasRewriter, a pre-validation SqlShuttle that wraps unnamed, non-identifier items with AS <text> via the existing LanguageSpec.postParseRules hook. SELECT COUNT(*) FROM t -> `COUNT(*)` (was EXPR$0) SELECT UPPER(name) -> `UPPER(name)` SELECT x AS y, name, * -> unchanged Resolves opensearch-project#5332 Signed-off-by: Chen Dai <daichen@amazon.com>
2166410 to
b16ae8c
Compare
|
Persistent review updated to latest commit b16ae8c |
Description
This PR preserves the original expression text as column names by adding a SqlNode rewriter to the existing
LanguageSpec.postParseRuleshook, instead of Calcite's default syntheticEXPR$0,EXPR$1, etc.Examples:
Implementation notes:
Column-naming conventions vary across engines: See comparison table. This PR produces canonical built-in function names with bare user identifiers (
COUNT(*),SUM(MyCol),UPPER(name)).Verbatim preservation (V2/PPL V3/MySQL) is debatable: SQL treats unquoted function names as case-insensitive (
COUNT==count), so canonical labels are spec-consistent. Verbatim text also couples column names to source formatting —SUM(x)+1andSUM(x) + 1would produce different labels for the same query. We'll revisit verbatim preservation in [FEATURE] Unified SQL language across OpenSearch, Flint and Spark SQL #5346.Related Issues
Resolves #5332
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.