HIVE-29457: HiveSortExchangePullUpConstantsRule doesn't remove consta…#6316
HIVE-29457: HiveSortExchangePullUpConstantsRule doesn't remove consta…#6316soumyakanti3578 merged 3 commits intoapache:masterfrom
Conversation
…nt column from distribution keys
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelDistribution.java
Outdated
Show resolved
Hide resolved
| SELECT col1 FROM test | ||
| WHERE col2 = 'a' | ||
| DISTRIBUTE BY col1, col2 | ||
| SORT BY col1, col2; |
There was a problem hiding this comment.
Can we drop the SORT BY to minimize the repro?
SELECT col1, col2 FROM test
WHERE col2 = 'a'
DISTRIBUTE BY col1, col2There was a problem hiding this comment.
Unfortunately this fails with:
EXPLAIN CBO
SELECT col1 FROM test
WHERE col2 = 'a'
DISTRIBUTE BY col1, col2
fname=distribution_key_constant_value.q
See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, or check ./ql/target/surefire-reports or ./itests/qtest/target/surefire-reports/ for specific test cases logs.
org.apache.hadoop.hive.ql.parse.SemanticException: Line 6:20 Invalid table alias or column reference 'col2': (possible column names are: col1)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genAllRexNode(CalcitePlanner.java:5224)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genAllRexNode(CalcitePlanner.java:5154)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner$OrderByRelBuilder.getOrderByExpression(CalcitePlanner.java:5475)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner$OrderByRelBuilder.genSortByKey(CalcitePlanner.java:5441)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner$OrderByRelBuilder.addRelDistribution(CalcitePlanner.java:5507)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genSBLogicalPlan(CalcitePlanner.java:3945)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:4975)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1611)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1553)
at org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:140)
at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:936)
at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:191)
at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:135)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1331)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:588)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13222)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:481)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:358)
at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:187)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:358)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
I think this is a bug and should be resolved in another ticket.
There was a problem hiding this comment.
If you use the query I shared above (with col2 in the SELECT) it doesn't throw the Invalid table alias.
There was a problem hiding this comment.
Yes, that worked, thanks! And I have updated the test in the latest commit.
zabetak
left a comment
There was a problem hiding this comment.
We could do a bit of refactoring in HiveSortPullUpConstantsRule but it does not have to happen necessarily as part of this PR. We could leave it for a follow-up.
| private RelDistribution applyToDistribution( | ||
| RelDistribution distribution, Mappings.TargetMapping mapping) { | ||
| List<Integer> newKeys = new ArrayList<>(); | ||
| for (int key : distribution.getKeys()) { | ||
| final int target = mapping.getTargetOpt(key); | ||
| if (target < 0) { | ||
| // It is a constant, we can ignore it | ||
| continue; | ||
| } | ||
| newKeys.add(target); | ||
| } | ||
|
|
||
| return new HiveRelDistribution(distribution.getType(), newKeys); | ||
| } |
There was a problem hiding this comment.
This is very similar to applyToFieldCollations it would be nice to see if we can refactor some of the commons parts together.
There was a problem hiding this comment.
I tried moving the for loop which checks if the key is present in the mapping to a new method. However, this doesn't really simplify applyToFieldCollations as we still need this loop: for (RelFieldCollation fc : relCollation.getFieldCollations()).
Maybe we can revisit this in a follow up.
|
|
@zabetak |



…nt column from distribution keys
What changes were proposed in this pull request? & Why are the changes needed?
Explained in https://issues.apache.org/jira/browse/HIVE-29457
Does this PR introduce any user-facing change?
No
How was this patch tested?