-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat(starrocks): add full support for partitions #6804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(starrocks): add full support for partitions #6804
Conversation
f93c63c to
09aef20
Compare
|
@petrikoro thank you for the PR, great work. I have some suggestions.
I will add some extra inline comments for help. |
Depends on tobymao#6804 Please review/merge tobymao#6804 first. This PR only contains changes on top of that PR. - Import expression partitioning for MV. - Enabled ALTER TABLE … RENAME for StarRocks. - Emitted ORDER BY via CLUSTER BY for StarRocks outputs. - Added MV (REFRESH) properties handling for StarRocks materialized views. - And, tests updated/added for the new StarRocks behaviors. Signed-off-by: jaogoy <jaogoy@gmail.com>
|
Hey @petrikoro 👋 Are you planning to take this to the finish line? |
Hi 👋 Sure, I plan to get back to PR tomorrow. Thanks for the suggestions @geooo109! |
Hi! Take a look at a477f75, did I get that right? |
|
@petrikoro thank you very much, will check it soon. |
geooo109
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!! left some comments.
sqlglot/dialects/doris.py
Outdated
| if self._match_text_seq("RANGE"): | ||
| partition_expressions = self._parse_wrapped_csv(self._parse_assignment) | ||
| self._match_l_paren() | ||
|
|
||
| if self._match_text_seq("FROM", advance=False): | ||
| create_expressions = self._parse_csv( | ||
| self._parse_partitioning_granularity_dynamic | ||
| ) | ||
| elif self._match_text_seq("PARTITION", advance=False): | ||
| create_expressions = self._parse_csv(self._parse_partition_definition) | ||
| else: | ||
| create_expressions = None | ||
|
|
||
| self._match_r_paren() | ||
|
|
||
| return self.expression( | ||
| exp.PartitionByRangeProperty, | ||
| partition_expressions=partition_expressions, | ||
| create_expressions=create_expressions, | ||
| ) | ||
|
|
||
| return self._parse_partitioned_by() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use this ^ to factor out the logic here both for doris and starrock.
- For doris rename
_parse_partition_definitionto_parse_partition_range_value(parent class method) - Apply https://github.com/tobymao/sqlglot/pull/6804/changes#r2721199101
- And put this method in the base dialect , we can check for "FROM" OR "START"
- also use the previous logic
if not self._match_text_seq("RANGE"):
return super()._parse_partitioned_by()
to avoid the extra if-nesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@petrikoro Let's do the 1., 2., and .4., because 3. may be complex, and I will do it in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| return unnest | ||
|
|
||
| def _parse_partition_property( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sqlglot/dialects/doris.py
Outdated
| if self._match_text_seq("LIST"): | ||
| return self.expression( | ||
| exp.PartitionByListProperty, | ||
| partition_expressions=self._parse_wrapped_csv(self._parse_assignment), | ||
| create_expressions=self._parse_wrapped_csv(self._parse_partition_list_value), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if self._match_text_seq("LIST"): | |
| return self.expression( | |
| exp.PartitionByListProperty, | |
| partition_expressions=self._parse_wrapped_csv(self._parse_assignment), | |
| create_expressions=self._parse_wrapped_csv(self._parse_partition_list_value), | |
| ) | |
| if self._match_text_seq("LIST", advance=False): | |
| return super()._parse_partition_property() |
sqlglot/dialects/mysql.py
Outdated
| self._match_text_seq("VALUES", "LESS", "THAN") | ||
| values = self._parse_wrapped_csv(self._parse_expression) | ||
|
|
||
| if ( | ||
| len(values) == 1 | ||
| and isinstance(values[0], exp.Column) | ||
| and values[0].name.upper() == "MAXVALUE" | ||
| ): | ||
| values = [exp.var("MAXVALUE")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use here a parsing helper in order to reuse it in the similar _parse_partition_range_value doris function.
| def partitionedbyproperty_sql(self, expression: exp.PartitionedByProperty) -> str: | ||
| this = expression.this | ||
| partition_cols = this.expressions if isinstance(this, exp.Schema) else [this] | ||
| is_cols = all(isinstance(col, (exp.Column, exp.Identifier)) for col in partition_cols) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need this check here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check distinguishes between two different StarRocks partitioning syntaxes:
-
Column-based partitioning - needs parentheses, especially if it's StarRocks < 3.4, see https://docs.starrocks.io/docs/table_design/data_distribution/expression_partitioning/#parameters-1 (note bellow parameters):
PARTITION BY (col1, col2)
-
Expression-based partitioning - no parentheses:
PARTITION BY date_trunc('day', ts), col1
When it's simple column/identifier references, StarRocks expects PARTITION BY (columns) with parens. But when you use expressions like date_trunc() or str2date(), the syntax is PARTITION BY expr without wrapping parens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments for this in e424f88
Fixes #6803
Description
This PR adds comprehensive support for StarRocks partitioning syntax including:
PARTITION BY expr1, expr2): https://docs.starrocks.io/docs/table_design/data_distribution/expression_partitioning/PARTITION BY LIST (cols) (...)): https://docs.starrocks.io/docs/table_design/data_distribution/list_partitioning/PARTITION BY RANGE (cols) (PARTITION ... VALUES LESS THAN ...)): https://docs.starrocks.io/docs/table_design/data_distribution/dynamic_partitioning/#examplePARTITION BY RANGE (cols) (START ... END ... EVERY ...)): https://docs.starrocks.io/docs/table_design/data_distribution/dynamic_partitioning/#examplePreviously, multi-expression partitioning and LIST partitioning were not supported.
Examples
Expression-based partitioning:
LIST partitioning:
RANGE partitioning with explicit values:
RANGE partitioning with START/END/EVERY:
See more in
tests/dialects/test_starrocks.pyTesting
All syntax variations have been validated against a local StarRocks instance (tested on StarRocks 4.0.2 and 3.5.0).