Skip to content

fix: handle NULLs in sliding SUM(DISTINCT) window frames#22755

Open
kumarUjjawal wants to merge 2 commits into
apache:mainfrom
kumarUjjawal:fix/sliding-distinct-sum-nulls
Open

fix: handle NULLs in sliding SUM(DISTINCT) window frames#22755
kumarUjjawal wants to merge 2 commits into
apache:mainfrom
kumarUjjawal:fix/sliding-distinct-sum-nulls

Conversation

@kumarUjjawal
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

SUM(DISTINCT) over bounded/sliding window frames did not handle NULL values correctly.

It could read values from null slots in the Arrow buffer, and it returned 0 instead of NULL when a frame had no non-null values.

What changes are included in this PR?

  • skip NULL rows in sliding SUM(DISTINCT) update
  • skip NULL rows in sliding SUM(DISTINCT) retract
  • return NULL when the frame has no non-null distinct values
  • reuse the same helper for distinct-value updates during merge
  • add regression tests for accumulator behavior with null slots
  • add sqllogictest coverage for sliding window SQL behavior, including when a frame becomes all NULL

Are these changes tested?

Yes

Are there any user-facing changes?

SUM(DISTINCT) over bounded/sliding window frames now handles NULL values correctly:

  • NULL inputs are ignored
  • frames with no non-null values return NULL instead of 0
  • No API Change

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Jun 4, 2026
Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kumarUjjawal
Thanks for the fix. This looks good and I only have one small suggestion.

if *cnt == 0 {
// first occurrence in window
self.sum = self.sum.wrapping_add(v);
if arr.null_count() == 0 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup overall. One small thought: the NULL-aware iteration logic looks very similar between update_batch and retract_batch. Would it make sense to extract a small helper that keeps the current no-null fast path and accepts the value operation, something like apply_valid_values(values, Self::update_value) and apply_valid_values(values, Self::retract_value)?

That would keep the invariant that only valid slots affect counts and sum in one place and help avoid the two paths drifting apart over time.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks you @kosiew

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SUM(DISTINCT) over a bounded window frame ignores NULLs' validity mask and returns 0 instead of NULL

2 participants