Skip to content

Conversation

@petrikoro
Copy link
Contributor

@petrikoro petrikoro commented Jan 19, 2026

Fixes #6803

Description

This PR adds comprehensive support for StarRocks partitioning syntax including:

Previously, multi-expression partitioning and LIST partitioning were not supported.

Examples

Expression-based partitioning:

-- Single expression
CREATE TABLE t (col DATE) PARTITION BY DATE_TRUNC('DAY', col)

-- Multiple expressions  
CREATE TABLE t (col1 STRING, col2 BIGINT) PARTITION BY FROM_UNIXTIME(col2, '%Y%m%d'), col1

LIST partitioning:

-- Single column
CREATE TABLE t (city STRING) PARTITION BY LIST (city) (
    PARTITION pLA VALUES IN ('Los Angeles'),
    PARTITION pSF VALUES IN ('San Francisco')
)

-- Multi-column
CREATE TABLE t (dt DATE, city STRING) PARTITION BY LIST (dt, city) (
    PARTITION p1 VALUES IN (('2022-04-01', 'LA'), ('2022-04-01', 'SF'))
)

RANGE partitioning with explicit values:

CREATE TABLE t (col DATE) PARTITION BY RANGE (col) (
    PARTITION p1 VALUES LESS THAN ('2020-01-31'),
    PARTITION p2 VALUES LESS THAN ('2020-02-29'),
    PARTITION p_max VALUES LESS THAN (MAXVALUE)
)

-- With expression
CREATE TABLE t (col STRING) PARTITION BY RANGE (STR2DATE(col, '%Y-%m-%d')) (
    PARTITION p1 VALUES LESS THAN ('2021-01-01'),
    PARTITION p2 VALUES LESS THAN ('2021-01-02')
)

RANGE partitioning with START/END/EVERY:

CREATE TABLE t (col DATE) PARTITION BY RANGE (col) (
    START ('2019-01-01') END ('2021-01-01') EVERY (INTERVAL 1 YEAR),
    START ('2021-01-01') END ('2021-05-01') EVERY (INTERVAL 1 MONTH)
)

See more in tests/dialects/test_starrocks.py

Testing

All syntax variations have been validated against a local StarRocks instance (tested on StarRocks 4.0.2 and 3.5.0).

@georgesittas georgesittas requested a review from geooo109 January 20, 2026 13:31
@petrikoro petrikoro force-pushed the feat/add-full-support-for-starrocks-partitions branch from f93c63c to 09aef20 Compare January 20, 2026 13:38
@geooo109
Copy link
Collaborator

geooo109 commented Jan 20, 2026

@petrikoro thank you for the PR, great work.

I have some suggestions.

  1. There is a similar implementation in doris.py that we should check in order to factor out some code. ( a relevant commit here: 73c2894 )
  2. As I checked MySQL has some similar PARTITION BY syntax that we currently don't cover (same cases with the ones you posted without the dynamic one because it isn't supported in MySQL e.g. https://dev.mysql.com/doc/refman/8.4/en/partitioning-range.html ). We can push some implemenation in this dialect and inherit + implement some extra logic in the derived dialects, thus adding functionallity in the MySQL dialect + removing extra code from the deried dialects (Doris, Starrocks).
  3. For common patterns between Doris and Starrocks that don't exist in MySQL we can factor out in the Dialect class.

I will add some extra inline comments for help.

jaogoy added a commit to jaogoy/sqlglot that referenced this pull request Jan 21, 2026
Depends on tobymao#6804

Please review/merge tobymao#6804 first.
This PR only contains changes on top of that PR.

- Import expression partitioning for MV.
- Enabled ALTER TABLE … RENAME for StarRocks.
- Emitted ORDER BY via CLUSTER BY for StarRocks outputs.
- Added MV (REFRESH) properties handling for StarRocks materialized
views.
- And, tests updated/added for the new StarRocks behaviors.

Signed-off-by: jaogoy <jaogoy@gmail.com>
@georgesittas
Copy link
Collaborator

Hey @petrikoro 👋

Are you planning to take this to the finish line?

@petrikoro
Copy link
Contributor Author

Hey @petrikoro 👋

Are you planning to take this to the finish line?

Hi 👋

Sure, I plan to get back to PR tomorrow. Thanks for the suggestions @geooo109!

@petrikoro
Copy link
Contributor Author

@petrikoro thank you for the PR, great work.

I have some suggestions.

  1. There is a similar implementation in doris.py that we should check in order to factor out some code. ( a relevant commit here: 73c2894 )
  2. As I checked MySQL has some similar PARTITION BY syntax that we currently don't cover (same cases with the ones you posted without the dynamic one because it isn't supported in MySQL e.g. https://dev.mysql.com/doc/refman/8.4/en/partitioning-range.html ). We can push some implemenation in this dialect and inherit + implement some extra logic in the derived dialects, thus adding functionallity in the MySQL dialect + removing extra code from the deried dialects (Doris, Starrocks).
  3. For common patterns between Doris and Starrocks that don't exist in MySQL we can factor out in the Dialect class.

I will add some extra inline comments for help.

Hi! Take a look at a477f75, did I get that right?

@geooo109
Copy link
Collaborator

@petrikoro thank you very much, will check it soon.

Copy link
Collaborator

@geooo109 geooo109 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!! left some comments.

Comment on lines 87 to 108
if self._match_text_seq("RANGE"):
partition_expressions = self._parse_wrapped_csv(self._parse_assignment)
self._match_l_paren()

if self._match_text_seq("FROM", advance=False):
create_expressions = self._parse_csv(
self._parse_partitioning_granularity_dynamic
)
elif self._match_text_seq("PARTITION", advance=False):
create_expressions = self._parse_csv(self._parse_partition_definition)
else:
create_expressions = None

self._match_r_paren()

return self.expression(
exp.PartitionByRangeProperty,
partition_expressions=partition_expressions,
create_expressions=create_expressions,
)

return self._parse_partitioned_by()
Copy link
Collaborator

@geooo109 geooo109 Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use this ^ to factor out the logic here both for doris and starrock.

  1. For doris rename _parse_partition_definition to _parse_partition_range_value (parent class method)
  2. Apply https://github.com/tobymao/sqlglot/pull/6804/changes#r2721199101
  3. And put this method in the base dialect , we can check for "FROM" OR "START"
  4. also use the previous logic
if not self._match_text_seq("RANGE"):
    return super()._parse_partitioned_by()

to avoid the extra if-nesting.

Copy link
Collaborator

@geooo109 geooo109 Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@petrikoro Let's do the 1., 2., and .4., because 3. may be complex, and I will do it in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geooo109 Thanks for the feedback! Feel free to take another look whenever you have a chance: e424f88


return unnest

def _parse_partition_property(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 80 to 85
if self._match_text_seq("LIST"):
return self.expression(
exp.PartitionByListProperty,
partition_expressions=self._parse_wrapped_csv(self._parse_assignment),
create_expressions=self._parse_wrapped_csv(self._parse_partition_list_value),
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self._match_text_seq("LIST"):
return self.expression(
exp.PartitionByListProperty,
partition_expressions=self._parse_wrapped_csv(self._parse_assignment),
create_expressions=self._parse_wrapped_csv(self._parse_partition_list_value),
)
if self._match_text_seq("LIST", advance=False):
return super()._parse_partition_property()

Comment on lines 761 to 769
self._match_text_seq("VALUES", "LESS", "THAN")
values = self._parse_wrapped_csv(self._parse_expression)

if (
len(values) == 1
and isinstance(values[0], exp.Column)
and values[0].name.upper() == "MAXVALUE"
):
values = [exp.var("MAXVALUE")]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use here a parsing helper in order to reuse it in the similar _parse_partition_range_value doris function.

def partitionedbyproperty_sql(self, expression: exp.PartitionedByProperty) -> str:
this = expression.this
partition_cols = this.expressions if isinstance(this, exp.Schema) else [this]
is_cols = all(isinstance(col, (exp.Column, exp.Identifier)) for col in partition_cols)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need this check here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check distinguishes between two different StarRocks partitioning syntaxes:

  1. Column-based partitioning - needs parentheses, especially if it's StarRocks < 3.4, see https://docs.starrocks.io/docs/table_design/data_distribution/expression_partitioning/#parameters-1 (note bellow parameters):

    PARTITION BY (col1, col2)
  2. Expression-based partitioning - no parentheses:

    PARTITION BY date_trunc('day', ts), col1

When it's simple column/identifier references, StarRocks expects PARTITION BY (columns) with parens. But when you use expressions like date_trunc() or str2date(), the syntax is PARTITION BY expr without wrapping parens

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments for this in e424f88

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

StarRocks PARTITION BY expression not generated correctly

3 participants