Skip to content

chore(audit): audit Average and expand tests#4439

Open
andygrove wants to merge 3 commits into
apache:mainfrom
andygrove:autonomous-audit/avg
Open

chore(audit): audit Average and expand tests#4439
andygrove wants to merge 3 commits into
apache:mainfrom
andygrove:autonomous-audit/avg

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

N/A. Autonomous audit pass.

Rationale for this change

Audit of the Average (avg) aggregate expression against Spark 3.4.3, 3.5.8, and 4.0.1. The aggregate logic is identical across all three versions (4.0.1 only changes a QueryContext import path). The Comet serde and the Rust Avg / AvgDecimal accumulators correctly handle numeric and decimal inputs, including ANSI-mode decimal overflow. The audit found one inaccurate user-facing string in the serde and several uncovered edge cases that are now exercised.

What changes are included in this PR?

  • Audit sub-bullets in spark_expressions_support.md recording dates and the per-version finding for 3.4.3, 3.5.8, and 4.0.1.
  • Corrected CometAverage.getIncompatibleReasons text. The previous text claimed "Falls back to Spark in ANSI mode. Supports all numeric inputs except decimal types", neither of which is accurate: ANSI mode is wired through to the native AvgDecimal accumulator, and decimal inputs are supported via avgDataTypeSupported. The new text describes the real caveat: Comet falls back to Spark for YearMonthIntervalType and DayTimeIntervalType inputs (which Spark supports since 3.4).
  • Expanded expressions/aggregate/avg.sql with new SQL test cases: single-row group; tinyint and smallint inputs; all-NULL groups; empty input; double NaN / +Infinity / -Infinity mixes; Long boundary values; negative-only inputs; decimal at precision 20; cross-check against count.

How are these changes tested?

  • ./mvnw test -DwildcardSuites=CometSqlFileTestSuite -Dsuites="org.apache.comet.CometSqlFileTestSuite avg" -Dtest=none (passes locally; all new queries match Spark)

Scaffolded by the audit-comet-expression-autonomous skill.

andygrove added 3 commits May 26, 2026 09:29
Interval inputs fall through avgDataTypeSupported and convert() returns
None, so they are unsupported (no native code path), not incompatible
(no allowIncompatible opt-in).
@andygrove andygrove marked this pull request as ready for review May 27, 2026 13:18
Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for avg, once windows PR is merged, should we also include widows avg tests here to be audited properly, #4209

@andygrove
Copy link
Copy Markdown
Member Author

#4209

Good point. I think we should add tests now for windows support so that we can catch regressions when window support is enabled. I will add to this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants