Feat(starrocks)!: improve some StarRocks sql generation by jaogoy · Pull Request #6737 · tobymao/sqlglot

jaogoy · 2026-01-14T03:34:46Z

Added StarRocks DDL support for three partition methods.
Enabled ALTER TABLE … RENAME for StarRocks.
Emitted ORDER BY via CLUSTER BY for StarRocks outputs.
Added MV (REFRESH) properties handling for StarRocks materialized views.
Tests updated/added for the new StarRocks behaviors.

* support ?:: * include base COLUMN_OPERATORS

…6447) * annotate type for MODE function snowflake * support multiple semantics * address test comment

…ymao#6470) * annotate type for PERCENTILE_CONT snowflake * fix format * fix test

Co-authored-by: Michael Lee <michael.lee@michael.lee-FMF6J19R7N>

…eturn type VARCHAR (tobymao#6475) # Conflicts: # sqlglot/typing/snowflake.py # tests/fixtures/optimizer/annotate_functions.sql

…obymao#6476)

…o#6474) * chore(optimizer)!: annotate type for snowflake func TO_BINARY * remove unnecessary function * add test to test_dialect --------- Co-authored-by: Michael Lee <michael.lee@michael.lee-FMF6J19R7N>

)

…TANCE, VECTOR_COSINE_SIMILARITY functions (tobymao#6468) * Annotate type for vector distances (VECTOR_L1_DISTANCE, VECTOR_L2_DISTANCE, VECTOR_COSINE_SIMILARITY) * Added VECTOR_L1_DISTANCE to TRANSFORMS * Removed redundant _sql_names

… types (tobymao#6482)

…ct (tobymao#6483)

* Type annotation for REGR_* functions * removed unrequired change * added tests for other databases and made all REGR classes inherit from AggFunc * removed unsupported databases

…o#6481) * chore(optimizer)!: annotate type for snowflake func TO_BOOLEAN * formatting * remove _sql_names from ToBoolean expression * remove TO_BOOLEAN from oracle function parsing --------- Co-authored-by: Michael Lee <michael.lee@michael.lee-FMF6J19R7N>

* wrap bq -> duckdb REGEXP_EXTRACT SUBSTRING() call in NULLIF * Use dialect-specific constant for position overflow semantics

…ao#6487)

* Feat(BigQuery)!: Add support for the NET.HOST function * PR feedback

* support dcolonqmark * add testcases

… for BITNOT (tobymao#6490) * feat(duckdb): Add transpilation support for the negative integer args for BITNOT * Update sqlglot/dialects/duckdb.py --------- Co-authored-by: Vaggelis Danias <daniasevangelos@gmail.com>

…_AGG / STRING_AGG functions (tobymao#6463)

…6496

…obymao#6493

* annotate the TANH * feat(optimizer): Refactor Tanh and Atan2 annotations for consistency * feat(optimizer): Add Tanh annotation for Hive * feat(optimizer): Update TANH dialect support to include Hive

…om snowflake to duckdb (tobymao#6690) * Fix TO_TIME/TRY_TO_TIME transpilation from Snowflake to DuckDB for compact formats * switched to expressions instead of strings, added tests * added tests, removed generator.py changes * added missing test * minor tweak, removed invalid test

* Annotate Type for Kurtosis * missing test * made sure type annotation was correct for decfloat, double, number * Small cleanup --------- Co-authored-by: Jo <46752250+georgesittas@users.noreply.github.com>

…loses tobymao#6727

Signed-off-by: jaogoy <jaogoy@gmail.com>

VaggelisD

Hey @jaogoy, thank you for the PR! As a heads up, there are many different changes clustered together here, it'd be appreciated if we separated these in different PRs.

Leaving a few comments from an initial pass regardless:

sqlglot/dialects/starrocks.py

tests/dialects/test_starrocks.py

sqlglot/dialects/starrocks.py

Signed-off-by: jaogoy <jaogoy@gmail.com>

VaggelisD · 2026-01-15T10:16:35Z

sqlglot/dialects/starrocks.py

+        def cluster_sql(self, expression: exp.Cluster) -> str:
+            """Generate StarRocks ORDER BY clause for clustering.
+
+            StarRocks uses ORDER BY instead of CLUSTER BY for table clustering.
+            This override ensures exp.Cluster generates the correct syntax.
+
+            Example:
+                exp.Cluster(expressions=[id, name]) → "ORDER BY (id, name)"
+            """
+            expressions = self.expressions(expression, flat=True)
+            return f"ORDER BY ({expressions})" if expressions else ""


Given that Starrocks does not support CLUSTER BY then this is regarding transpilation. Are we sure that the semantics of the input dialect are always equivalent to SR's ORDER BY?

For instance, coming from Spark it looks like CLUSTER BY = DISTRIBUTE BY + ORDER BY in MapReduce, whereas for Snowflake I get the feeling that it resembles Starrocks's PARTITION BY (?)

cluster is a kind of data storage.
cluster = partition + distribution + order by.
If there is no partition and distribution, it's the same as order by.
In snowflake, the cluster contains some similar concept with partitions/distribution by using micro partitions.
So for StarRocks, there is not differency between cluster and order by, that is StarRocks won't use another cluster different with order by.

VaggelisD

Hey @jaogoy, sending a few more comments your way, appreciate your cooperation thus far!

sqlglot/dialects/starrocks.py

sqlglot/generator.py

also, mv test cases into TestStarocks' test_ddl Signed-off-by: jaogoy <jaogoy@gmail.com>

Signed-off-by: jaogoy <jaogoy@gmail.com>

sqlglot/dialects/starrocks.py

VaggelisD · 2026-01-16T12:30:10Z

sqlglot/dialects/starrocks.py

+            any_func_expr = any(isinstance(e, (exp.Func, exp.Anonymous)) for e in node.expressions) \
+                if isinstance(node, exp.Tuple) else False


Is this necessary or is the VIEW lookup sufficient to know whether to have parentheses? If yes, we can remove any_func_expr logic completely

No, only RANGE(c1, c2) and LIST(c1, c2) can be without parentheses, others need parentheses:

for MV partitioning.

for tables, but column names only, such as (c1 ,c2).

VaggelisD · 2026-01-16T12:32:58Z

sqlglot/dialects/starrocks.py

+            any_func_expr = any(isinstance(e, (exp.Func, exp.Anonymous)) for e in node.expressions) \
+                if isinstance(node, exp.Tuple) else False
+            create = expression.find_ancestor(exp.Create)
+            # SR needs `(...)` for MVs, with parens, and columns only


Is this comment relevant? #6737 (comment)

Yes. But VIEWs don't have partitions, so for simiplicity, it doesn't matter.

VaggelisD · 2026-01-16T12:33:43Z

sqlglot/dialects/starrocks.py

+            if create and create.kind == "VIEW" or not any_func_expr:
+                return f"PARTITION BY ({partition_columns_str})"
+            else:
+                # SR doesn't support `(func(...), col2)` with parens for normal tables
+                return f"PARTITION BY {partition_columns_str}"


Suggested change

if create and create.kind == "VIEW" or not any_func_expr:

return f"PARTITION BY ({partition_columns_str})"

else:

# SR doesn't support `(func(...), col2)` with parens for normal tables

return f"PARTITION BY {partition_columns_str}"

if create and create.kind == "VIEW" or not any_func_expr:

partition_columns_str = f"({partition_columns_str})"

return f"PARTITION BY {partition_columns_str}"

VaggelisD · 2026-01-16T12:34:17Z

tests/dialects/test_starrocks.py

+        # LIST partitioning
+        list_partition = exp.PartitionByListProperty(
+            partition_expressions=[exp.column("c1")],
+            create_expressions=[
+                exp.Var(this="PARTITION p1 VALUES IN (1, 2)"),
+                exp.Var(this="PARTITION p2 VALUES IN ('US', 'CN')"),
+            ],
+        )


Lets not test RANGE/LIST partitioning by manually creating expressions, we can test them through SQL by adding them to the ddl_sqls list like you did with the other new entries

Do you mean using self.validate_identity(...)? But:

PartitionByListProperty is not supported in Parser for StarRocks now.

It can support to define create_expressions with raw exp.Var() or structured expresion, such as exp.Partition(...). While using self.validate_identity(...), we can only test one way by using SQL statements, even it's supported in Parser.
Right?

And, do I need to move the corresponding code from Doris to base Parser/Generator.

VaggelisD · 2026-01-16T12:35:33Z

tests/dialects/test_starrocks.py

+        # MV : Refresh trigger property
+        manual_refresh = exp.RefreshTriggerProperty(kind=exp.var("MANUAL"))
+        self.assertEqual(manual_refresh.sql(dialect="starrocks"), "REFRESH MANUAL")
+
+        async_refresh = exp.RefreshTriggerProperty(
+            method=exp.var("IMMEDIATE"),
+            kind=exp.var("ASYNC"),
+            starts=exp.Literal.string("2025-01-01 00:00:00"),
+            every=exp.Literal.number(5),
+            unit=exp.var("MINUTE"),
+        )
+        self.assertEqual(
+            async_refresh.sql(dialect="starrocks"),
+            "REFRESH IMMEDIATE ASYNC START ('2025-01-01 00:00:00') EVERY (INTERVAL 5 MINUTE)",
+        )


Ditto, let's not use manually created expressions, lets prefer to test DDLs that have these clauses through self.validate_identity

But, RefreshTriggerProperty is not supported in Parser now. How can we use validate_identity to test it?
I've tried use self.validate_identity("CREATE MATERIALIZED VIEW mv_name REFRESH MANUAL AS SELECT * FROM t;) to tested it, it's not correct.
Do I need to move the corresponding code from Doris to base Parser/Generator, but there are a few differences between Doris and StarRocks.

jaogoy · 2026-01-20T01:22:03Z

@VaggelisD Thank you for your meticulous review.
I have fixed most of them according to you opinion.

Do I need to submit another PR for this PR is closed by Tobymao. Or do I need to do any more operations?

jaogoy · 2026-01-20T01:22:30Z

@VaggelisD Thank you for your meticulous review.
I have fixed most of them according to you opinion.

Do I need to submit another PR for this PR is closed by Tobymao. Or do I need to do any more operations?

georgesittas · 2026-01-20T10:56:41Z

Hey @jaogoy, the PR was closed by mistake, due to a git operation that was used to clean up old caches.

Can you please open a new PR after you rebase off of main? Thanks!

jaogoy · 2026-01-21T09:03:19Z

@georgesittas @VaggelisD I've created another PR(#6827), and it depends on #6804, which also supports partitioning parse and generation, submitted by @petrikoro a day ago.

Please have a review. Thanks.

fivetran-kwoodbeck and others added 30 commits December 3, 2025 10:14

feat(optimizer)!: Annotate type for ZIPF (tobymao#6453)

d44bda3

feat(optimizer)!: Annotate type for XMLGET (tobymao#6457)

34dbd47

feat(databricks): add support for ?:: operator (tobymao#6469)

ff3f0f9

* support ?:: * include base COLUMN_OPERATORS

feat(snowflake)!: annotate type for MODE function snowflake (tobymao#…

0d211f2

…6447) * annotate type for MODE function snowflake * support multiple semantics * address test comment

feat(snowflake)!: annotate type for PERCENTILE_CONT in Snowflake (tob…

cc4c8ab

…ymao#6470) * annotate type for PERCENTILE_CONT snowflake * fix format * fix test

chore(optimizer): add tests for snowflake CAST function (tobymao#6471)

4d77500

Co-authored-by: Michael Lee <michael.lee@michael.lee-FMF6J19R7N>

feat(snowflake)!: annotation support for CURRENT REGION. Return type …

7dbc242

…VARCHAR (tobymao#6473)

feat(snowflake)!: annotation support for CURRENT_ORGANIZATION_NAME. R…

43a6a5c

…eturn type VARCHAR (tobymao#6475) # Conflicts: # sqlglot/typing/snowflake.py # tests/fixtures/optimizer/annotate_functions.sql

feat(snowflake)!: annotation support for CURRENT_ORGANIZATION_USER. (t…

f1f7c6a

…obymao#6476)

chore(optimizer)!: annotate type for snowflake func TO_BINARY (tobyma…

88dfd26

…o#6474) * chore(optimizer)!: annotate type for snowflake func TO_BINARY * remove unnecessary function * add test to test_dialect --------- Co-authored-by: Michael Lee <michael.lee@michael.lee-FMF6J19R7N>

chore: clean up TO_BINARY tests

483318b

fix(tsql): CEILING generation (tobymao#6477)

7021d54

chore: remove duckdb TO_BINARY 2 arg test

bff7084

chore: starrocks TO_BINARY tests

80591f9

feat(snowflake)!: annotation support for CURRENT_ROLE_TYPE (tobymao#6479

d268203

)

Feat(BigQuery)!: Add support for coercing STRING literals to temporal…

e6adba7

… types (tobymao#6482)

chore(exasol): implementing the last day function in exasol sql diale…

01e5a05

…ct (tobymao#6483)

feat(snowflake)!: annotate type for REGR_* functions (tobymao#6452)

68a5e61

* Type annotation for REGR_* functions * removed unrequired change * added tests for other databases and made all REGR classes inherit from AggFunc * removed unsupported databases

Feat(snowflake)!: annotate type for VECTOR_INNER_PRODUCT (tobymao#6486)

1531a67

Fix!: REGEXP_EXTRACT position arg overflow (tobymao#6458)

df4c1d3

* wrap bq -> duckdb REGEXP_EXTRACT SUBSTRING() call in NULLIF * Use dialect-specific constant for position overflow semantics

feat(snowflake)!: support padside argument for BIT[OR|AND|XOR] (tobym…

f6b2b3b

…ao#6487)

Feat(BigQuery): Add support for the NET.HOST function (tobymao#6480)

e891397

* Feat(BigQuery)!: Add support for the NET.HOST function * PR feedback

chore(duckdb): tests for MAX_BY and MIN_BY (tobymao#6489)

aacc981

feat(singlestore): support dcolonqmark (tobymao#6485)

2cc67cd

* support dcolonqmark * add testcases

fix(optimizer)!: support ORDER / LIMIT expressions for BigQuery ARRAY…

5a49c3f

…_AGG / STRING_AGG functions (tobymao#6463)

feat(snowflake)!: Annotated type for ARRAY_CONSTRUCT_COMPACT tobymao#…

ef130f1

…6496

Fix!: wrap connectives generated due to transpiling LIKE ANY closes t…

1b6076b

…obymao#6493

VaggelisD and others added 11 commits January 13, 2026 16:58

Chore(optimizer): Remove duplicate INITCAP annotation from Snowflake

2893ac3

feat(optimizer)!: Annotate ATAN2 for Spark & DBX (tobymao#6725)

09fa467

feat(optimizer)!: Annotate TANH for Spark & DBX (tobymao#6726)

b59b3bf

* annotate the TANH * feat(optimizer): Refactor Tanh and Atan2 annotations for consistency * feat(optimizer): Add Tanh annotation for Hive * feat(optimizer): Update TANH dialect support to include Hive

feat(snowflake)!: Type annotate for Snowflake Kurtosis (tobymao#6720)

e50a97e

* Annotate Type for Kurtosis * missing test * made sure type annotation was correct for decfloat, double, number * Small cleanup --------- Co-authored-by: Jo <46752250+georgesittas@users.noreply.github.com>

Feat(postgres): support index predicate in conflict INSERT clause c…

197943f

…loses tobymao#6727

Chore!: bump sqlglotrs to 0.11.0

33b8a5d

Update CHANGELOG.md for v28.6.0 [skip ci]

2654620

add test cases for starrocks

f316649

Signed-off-by: jaogoy <jaogoy@gmail.com>

Merge branch 'tobymao:main' into improv.sr

58252b5

optimize test cases for starrocks

5dd0dc4

Signed-off-by: jaogoy <jaogoy@gmail.com>

jaogoy mentioned this pull request Jan 14, 2026

Feat: Add StarRocks engine support TobikoData/sqlmesh#5658

Open

VaggelisD reviewed Jan 14, 2026

View reviewed changes

sqlglot/dialects/starrocks.py Show resolved Hide resolved

jaogoy added 2 commits January 15, 2026 10:16

refine code to remove unnecessary funcs

168833e

Signed-off-by: jaogoy <jaogoy@gmail.com>

refine code

9352255

VaggelisD reviewed Jan 15, 2026

View reviewed changes

georgesittas force-pushed the main branch from 5068bc7 to 14f9e93 Compare January 16, 2026 09:54

jaogoy added 3 commits January 16, 2026 19:26

optimize code using find_ancestor() for partition sql

a5fa3f2

also, mv test cases into TestStarocks' test_ddl Signed-off-by: jaogoy <jaogoy@gmail.com>

optimize code

572efe4

Signed-off-by: jaogoy <jaogoy@gmail.com>

optimize code

ef56e49

Signed-off-by: jaogoy <jaogoy@gmail.com>

VaggelisD reviewed Jan 16, 2026

View reviewed changes

tobymao closed this Jan 19, 2026

tobymao force-pushed the main branch from 426bce2 to 8634a8a Compare January 19, 2026 03:38

jaogoy mentioned this pull request Jan 21, 2026

Feat(starrocks)!: improve some starrocks properties generation #6827

Merged

		any_func_expr = any(isinstance(e, (exp.Func, exp.Anonymous)) for e in node.expressions) \
		if isinstance(node, exp.Tuple) else False

Conversation

jaogoy commented Jan 14, 2026

Uh oh!

VaggelisD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VaggelisD Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VaggelisD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaogoy Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaogoy Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaogoy commented Jan 20, 2026

Uh oh!

jaogoy commented Jan 20, 2026

Uh oh!

georgesittas commented Jan 20, 2026

Uh oh!

jaogoy commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Comments

VaggelisD Jan 15, 2026 •

edited

Loading

jaogoy Jan 19, 2026 •

edited

Loading

jaogoy Jan 19, 2026 •

edited

Loading