Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1700,6 +1700,16 @@ private VectorExpression getIdentityExpression(List<ExprNodeDesc> childExprList)
return ve;
}

/**
* Determines whether an expression and its children are compatible with the DECIMAL_64 vectorization execution path.
* The method evaluates the expression against known scenarios where DECIMAL_64 is not supported,
* returning false if any incompatibility is found. For UDFs, it handles specific implementations
* (e.g., GenericUDFToDecimal), checks for the @VectorizedExpressionsSupportDecimal64 annotation,
* and recursively verifies all child nodes. If all checks pass, it returns true.
* @param exprNodeDesc expression node to be evaluated.
* @return true if the expression and its children are DECIMAL_64 compatible, false otherwise.
* @throws HiveException
*/
private boolean checkExprNodeDescForDecimal64(ExprNodeDesc exprNodeDesc) throws HiveException {
if (exprNodeDesc instanceof ExprNodeColumnDesc) {
int colIndex = getInputColumnIndex((ExprNodeColumnDesc) exprNodeDesc);
Expand All @@ -1717,6 +1727,10 @@ private boolean checkExprNodeDescForDecimal64(ExprNodeDesc exprNodeDesc) throws
GenericUDF udf = ((ExprNodeGenericFuncDesc) exprNodeDesc).getGenericUDF();
Class<?> udfClass = udf.getClass();
if (udf instanceof GenericUDFToDecimal) {
ExprNodeDesc child = exprNodeDesc.getChildren().get(0);
if (isDecimalFamily(child.getTypeString())) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, while I understand the patch, I can see that the same check is actually performed a few lines below, see:

// Carefully check the children to make sure they are Decimal64.
List<ExprNodeDesc> children = exprNodeDesc.getChildren();
for (ExprNodeDesc childExprNodeDesc : children) {
// Some cases were converted before calling getVectorExpressionForUdf.
// So, emulate those cases first.
if (childExprNodeDesc instanceof ExprNodeConstantDesc) {
DecimalTypeInfo childDecimalTypeInfo =
decimalTypeFromCastToDecimal(childExprNodeDesc, returnDecimalType);
if (childDecimalTypeInfo == null) {
return false;
}
if (!checkTypeInfoForDecimal64(childDecimalTypeInfo)) {
return false;
}
continue;
}
// Otherwise, recurse.
if (!checkExprNodeDescForDecimal64(childExprNodeDesc)) {
return false;
}

couldn't that be used here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @abstractdog , Thanks for checking this.
I understand that we have the same check in the code following right after in the for loop, but currently in such casts, the code flow does not reach the for loop and directly returns true from Code Link, which is not correct for such decimal to decimal casts and causing issue.

Also, if we remove the whole check of udf instanceof GenericUDFToDecimal to make the code flow fall through the code logic ahead, i believe it will cause performance regression for other casts. For exm: if we have an integer column here in same case: CAST(integer_column as DECIMAL(15,1)) , this would tend to return false in such a case and IMO would be a performance regression.

Please do let me know what are your thoughts.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish I had a better understanding here, I believe checkExprNodeDescForDecimal64 method should have a good javadoc here, maybe the one I already found in this class should be improved and moved to this place:

      /*
       * For columns, we check decimal columns for DECIMAL_64 DataTypePhysicalVariation.
       * For UDFs, we check for @VectorizedExpressionsSupportDecimal64 annotation, etc.
       */
      final boolean isExprDecimal64 = checkExprNodeDescForDecimal64(childExpr);

Am I right to assume the following:

  1. checkExprNodeDescForDecimal64 checks an expression and its children, whether they are DECIMAL64 compatible
  2. in case of ExprNodeGenericFuncDesc, it first checks some specific cases: is GenericUDFToDecimal? has annotation? then recursively checks children nodes by calling checkExprNodeDescForDecimal64

the way it works is that it checks every known scenario when the expression is NOT decimal64 compatible (in which case it returns false), and returns true otherwise

I'm just thinking aloud, I would appreciate if you can confirm my understanding, and add a proper javadoc with the above snippet + my understanding

I believe we should have never added non-trivial conditions without explanation like below :)

      if (udf instanceof GenericUDFToDecimal) {
        return true;
      }

assming that we're on the same page with checkExprNodeDescForDecimal64, I have one more (maybe last) quesion, which is maybe related to the same scenario you mentioned: CAST(integer_column as DECIMAL(15,1)): for an integer isDecimalFamily returns false, I assume, in which case it simply falls to the return true branch, is it correct and expected?
whatever is the answer, I would appreciate one more "EXPLAIN VECTORIZATION DETAIL" and query in the q file with integer, or anything that hits this codepath:

      if (udf instanceof GenericUDFToDecimal) {
        ExprNodeDesc child = exprNodeDesc.getChildren().get(0);
        if (isDecimalFamily(child.getTypeString())) {
         return checkExprNodeDescForDecimal64(child);
        }
        return true; <----- THIS
      }

Copy link
Copy Markdown
Contributor Author

@tanishq-chugh tanishq-chugh Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abstractdog My understanding of the same completely aligns with yours here. I have added a javadoc for checkExprNodeDescForDecimal64() in this commit - e02bba3

Regarding the integer column part as well, you are absolutely right. isDecimalFamily() returns false, and true is returned from the code point you pointed out. Have added queries in the same qtest for the validation in commit - 73f664c

return checkExprNodeDescForDecimal64(child);
}
return true;
}
// We have a class-level annotation that says whether the UDF's vectorization expressions
Expand Down
26 changes: 26 additions & 0 deletions ql/src/test/queries/clientpositive/cast_decimal_vectorized.q
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
CREATE TABLE test_stats0 (e decimal(38,10)) stored as orc;
insert into test_stats0 (e) values (0.0);

set hive.vectorized.execution.enabled=false;
select count(*) from test_stats0 where CAST(e as DECIMAL(15,1)) BETWEEN 100.0 AND 1000.0;

set hive.vectorized.execution.enabled=true;
EXPLAIN VECTORIZATION DETAIL select count(*) from test_stats0 where CAST(e as DECIMAL(15,1)) BETWEEN 100.0 AND 1000.0;
select count(*) from test_stats0 where CAST(e as DECIMAL(15,1)) BETWEEN 100.0 AND 1000.0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this patch needs an EXPLAIN VECTORIZATION DETAIL for the same
can you also check if EXPLAIN VECTORIZATION DETAIL shows the difference between pre/post patch?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Have added the same in commit - 78bcca6

The only difference in EXPLAIN VECTORIZATION DETAIL with and without this patch lies in the predicateExpression.

Without this patch, predicateExpression is:
predicateExpression: FilterDecimal64ColumnBetween(col 3:decimal(15,1)/DECIMAL_64, decimal64LeftVal 1000, decimalLeftVal 1000, decimal64RightVal 10000, decimalRightVal 10000)(children: CastDecimalToDecimal(col 0:decimal(38,10)) -> 3:decimal(15,1))

With this patch, predicateExpression is:
predicateExpression: FilterDecimalColumnBetween(col 3:decimal(15,1), left 100, right 1000)(children: CastDecimalToDecimal(col 0:decimal(38,10)) -> 3:decimal(15,1))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I believe this exactly shows the expected behavior, which is that FilterDecimalColumnBetween is used, otherwise, the column vector produced by CastDecimalToDecimal (which is an instance of DecimalColumnVector) cannot be used from the parent expression FilterDecimal64ColumnBetween which expects a LongColumnVector (which represents the decimal64)


EXPLAIN VECTORIZATION DETAIL select count(*) from test_stats0 where CAST(e as DECIMAL(30,1)) BETWEEN 100.0 AND 1000.0;
select count(*) from test_stats0 where CAST(e as DECIMAL(30,1)) BETWEEN 100.0 AND 1000.0;


CREATE TABLE test_stats1 (int_col INT) stored as orc;
insert into test_stats1 (int_col) values (0);

set hive.vectorized.execution.enabled=false;
select count(*) from test_stats1 where CAST(int_col as DECIMAL(15,1)) BETWEEN 100.0 AND 1000.0;

set hive.vectorized.execution.enabled=true;
EXPLAIN VECTORIZATION DETAIL select count(*) from test_stats1 where CAST(int_col as DECIMAL(15,1)) BETWEEN 100.0 AND 1000.0;
select count(*) from test_stats1 where CAST(int_col as DECIMAL(15,1)) BETWEEN 100.0 AND 1000.0;

EXPLAIN VECTORIZATION DETAIL select count(*) from test_stats1 where CAST(int_col as DECIMAL(30,1)) BETWEEN 100.0 AND 1000.0;
select count(*) from test_stats1 where CAST(int_col as DECIMAL(30,1)) BETWEEN 100.0 AND 1000.0;
Loading
Loading