-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-29433: ClassCastException in FilterLongColumnBetween.evaluate when vectorization is enabled: DecimalColumnVector cannot be cast to class LongColumnVector #6366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
b4cc8d3
78bcca6
e02bba3
73f664c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| CREATE TABLE test_stats0 (e decimal(38,10)) stored as orc; | ||
| insert into test_stats0 (e) values (0.0); | ||
|
|
||
| set hive.vectorized.execution.enabled=false; | ||
| select count(*) from test_stats0 where CAST(e as DECIMAL(15,1)) BETWEEN 100.0 AND 1000.0; | ||
|
|
||
| set hive.vectorized.execution.enabled=true; | ||
| EXPLAIN VECTORIZATION DETAIL select count(*) from test_stats0 where CAST(e as DECIMAL(15,1)) BETWEEN 100.0 AND 1000.0; | ||
| select count(*) from test_stats0 where CAST(e as DECIMAL(15,1)) BETWEEN 100.0 AND 1000.0; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this patch needs an EXPLAIN VECTORIZATION DETAIL for the same
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. Have added the same in commit - 78bcca6 The only difference in Without this patch, predicateExpression is: With this patch, predicateExpression is:
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks, I believe this exactly shows the expected behavior, which is that
abstractdog marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| EXPLAIN VECTORIZATION DETAIL select count(*) from test_stats0 where CAST(e as DECIMAL(30,1)) BETWEEN 100.0 AND 1000.0; | ||
| select count(*) from test_stats0 where CAST(e as DECIMAL(30,1)) BETWEEN 100.0 AND 1000.0; | ||
|
|
||
|
|
||
| CREATE TABLE test_stats1 (int_col INT) stored as orc; | ||
| insert into test_stats1 (int_col) values (0); | ||
|
|
||
| set hive.vectorized.execution.enabled=false; | ||
| select count(*) from test_stats1 where CAST(int_col as DECIMAL(15,1)) BETWEEN 100.0 AND 1000.0; | ||
|
|
||
| set hive.vectorized.execution.enabled=true; | ||
| EXPLAIN VECTORIZATION DETAIL select count(*) from test_stats1 where CAST(int_col as DECIMAL(15,1)) BETWEEN 100.0 AND 1000.0; | ||
| select count(*) from test_stats1 where CAST(int_col as DECIMAL(15,1)) BETWEEN 100.0 AND 1000.0; | ||
|
|
||
| EXPLAIN VECTORIZATION DETAIL select count(*) from test_stats1 where CAST(int_col as DECIMAL(30,1)) BETWEEN 100.0 AND 1000.0; | ||
| select count(*) from test_stats1 where CAST(int_col as DECIMAL(30,1)) BETWEEN 100.0 AND 1000.0; | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, while I understand the patch, I can see that the same check is actually performed a few lines below, see:
hive/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java
Lines 1730 to 1752 in 01d9111
couldn't that be used here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @abstractdog , Thanks for checking this.
I understand that we have the same check in the code following right after in the for loop, but currently in such casts, the code flow does not reach the for loop and directly returns true from Code Link, which is not correct for such decimal to decimal casts and causing issue.
Also, if we remove the whole check of
udf instanceof GenericUDFToDecimalto make the code flow fall through the code logic ahead, i believe it will cause performance regression for other casts. For exm: if we have an integer column here in same case:CAST(integer_column as DECIMAL(15,1)), this would tend to return false in such a case and IMO would be a performance regression.Please do let me know what are your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish I had a better understanding here, I believe
checkExprNodeDescForDecimal64method should have a good javadoc here, maybe the one I already found in this class should be improved and moved to this place:Am I right to assume the following:
checkExprNodeDescForDecimal64checks an expression and its children, whether they are DECIMAL64 compatibleExprNodeGenericFuncDesc, it first checks some specific cases: is GenericUDFToDecimal? has annotation? then recursively checks children nodes by callingcheckExprNodeDescForDecimal64the way it works is that it checks every known scenario when the expression is NOT decimal64 compatible (in which case it returns false), and returns true otherwise
I'm just thinking aloud, I would appreciate if you can confirm my understanding, and add a proper javadoc with the above snippet + my understanding
I believe we should have never added non-trivial conditions without explanation like below :)
assming that we're on the same page with
checkExprNodeDescForDecimal64, I have one more (maybe last) quesion, which is maybe related to the same scenario you mentioned:CAST(integer_column as DECIMAL(15,1)): for an integerisDecimalFamilyreturns false, I assume, in which case it simply falls to thereturn truebranch, is it correct and expected?whatever is the answer, I would appreciate one more "EXPLAIN VECTORIZATION DETAIL" and query in the q file with integer, or anything that hits this codepath:
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abstractdog My understanding of the same completely aligns with yours here. I have added a javadoc for
checkExprNodeDescForDecimal64()in this commit - e02bba3Regarding the integer column part as well, you are absolutely right. isDecimalFamily() returns false, and true is returned from the code point you pointed out. Have added queries in the same qtest for the validation in commit - 73f664c