Skip to content

fix: Handle potential overflow in internal state for avg(decimal)#22714

Open
AdamGS wants to merge 1 commit into
apache:mainfrom
AdamGS:adamg/fix-avg-decimal-overflow
Open

fix: Handle potential overflow in internal state for avg(decimal)#22714
AdamGS wants to merge 1 commit into
apache:mainfrom
AdamGS:adamg/fix-avg-decimal-overflow

Conversation

@AdamGS
Copy link
Copy Markdown
Contributor

@AdamGS AdamGS commented Jun 2, 2026

Which issue does this PR close?

Rationale for this change

Fixes a bug with avg that currently prevents us from running TPC-DS q1. I think that this issues is masked by Parquet because the current implementations infers that columns as a Decimal128.

What changes are included in this PR?

  1. Mark Decimal32/64 as R in sqllogictest, like the bigger decimal types, and fixes some tests that used ?. (this can be a separate PR, but its seems very small).
  2. Adds a test for decimal32 overflow
  3. DecimalAvgAccumulator now takes another type to hold its inner sum accumulator, which can be different than the input/output type.
  4. Decimal32 and Decimal 64 use i64 and i128 (respectively) to prevent an overflow (should i128 use i256 here?).
  5. Adds some unit tests for the AVG impl building blocks.

Are these changes tested?

Additional SLT test that would've overflowed internally, and more focused unit tests for AvgGroupsAccumulator

Are there any user-facing changes?

None, just more code that doesn't currently work and will work now.

Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AdamGS

Thanks for working on this. The widening approach for Decimal32/64 looks good, but I think there are still correctness gaps around Decimal128 accumulation that need to be addressed before this can land. I also have a couple of non-blocking suggestions around SLT normalization and reducing duplication in the decimal averaging paths.

target_precision: *target_precision,
target_scale: *target_scale,
})),
) => Ok(Box::new(DecimalAvgAccumulator::<Decimal128Type>::new(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is still a correctness issue here for Decimal128 AVG. The accumulator state remains Decimal128Type, and the new summation path uses add_wrapping (for example at lines 651, 672, and 947). That means intermediate overflow can still silently wrap before DecimalAverager gets a chance to run.

For example, avg(arrow_cast('999999999999999999999999999999.9999', 'Decimal128(34, 4)')) over roughly 20,000 rows has a valid Decimal128(38, 8) result, but the intermediate Decimal128 sum exceeds i128::MAX and wraps along the way. At that point the average is already corrupted even though the final result would be representable.

Could we widen the Decimal128 accumulation state (for example to Decimal256) or otherwise use checked/compensated accumulation so intermediate overflow does not invalidate averages whose final result fits?

(
Decimal32(_, scale),
Decimal32(target_precision, target_scale),
) => Ok(Box::new(DecimalDistinctAvgAccumulator::<Decimal32Type>::with_decimal_params(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the same issue still exists for AVG(DISTINCT) on decimals. The distinct path constructs DecimalDistinctAvgAccumulator::<Decimal32Type>, Decimal64Type, and Decimal128Type, and those accumulators still sum distinct values in the native type using wrapping arithmetic.

AVG(DISTINCT) is still an average, so it can hit the same intermediate overflow problem whenever the average is representable but the sum is not. Could the widening/state-type fix be applied here as well? Otherwise it would be good to explicitly narrow the supported contract and add tests that document the unsupported behavior.

DataType::Float16
| DataType::Float32
| DataType::Float64
| DataType::Decimal32(_, _)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small suggestion: mapping Decimal32/Decimal64 globally to DFColumnType::Float makes SLT comparisons approximate for these decimal types and could hide formatting or rounding regressions that would otherwise be caught.

If the motivation is only the affected power/round queries, would it make sense to keep exact/text comparisons there instead? At minimum, it may be worth documenting why all Decimal32/64 SLT output is now treated approximately.

@@ -365,17 +370,27 @@
Decimal32(_sum_precision, sum_scale),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small refactoring suggestion: the Decimal32 and Decimal64 branches both follow the same pattern of creating a wider DecimalAverager, dividing by a widened count, and then try_from-ing back to the output native type.

It might be nice to factor that into a helper such as avg_decimal_with_wider_sum. That would encode the widening invariant in one place and help keep the accumulator and group-accumulator paths aligned over time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Decimal average can overflow because its inner intermediate sum state overflows its storage size

2 participants