Skip to content

fix: Enable sliding window execution for covar_pop, covar_samp, and corr#22764

Open
pchintar wants to merge 1 commit into
apache:mainfrom
pchintar:fix-sliding-covar-corr
Open

fix: Enable sliding window execution for covar_pop, covar_samp, and corr#22764
pchintar wants to merge 1 commit into
apache:mainfrom
pchintar:fix-sliding-covar-corr

Conversation

@pchintar
Copy link
Copy Markdown

@pchintar pchintar commented Jun 4, 2026

Which issue does this PR close?

Rationale for this change

Bounded sliding window queries using covar_pop, covar_samp, and corr currently fail with a retract_batch is not implemented error, preventing these aggregates from being used with sliding window frames.

What changes are included in this PR?

  • Included supports_retract_batch() for the covariance and correlation accumulators.
  • Added SQL logic tests covering bounded sliding window execution for covariance and correlation aggregates.

Are these changes tested?

Yes.

Added SQL logic tests covering:

  • Single-row bounded sliding frames
  • Multi-row bounded sliding frames

for covar_pop, covar_samp, and corr.

Are there any user-facing changes?

Yes.

covar_pop, covar_samp, and corr can now be used with bounded sliding window frames that previously failed. Also, no changes were made to any public APIs.

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Jun 4, 2026
@pchintar
Copy link
Copy Markdown
Author

pchintar commented Jun 4, 2026

cc @kumarUjjawal @Jefffrey

@kumarUjjawal
Copy link
Copy Markdown
Contributor

Hi @pchintar Thank you for on this. Can you fix the CI.

@pchintar pchintar force-pushed the fix-sliding-covar-corr branch from a3800fd to 7090ba9 Compare June 5, 2026 03:38
@pchintar
Copy link
Copy Markdown
Author

pchintar commented Jun 5, 2026

Hi @pchintar Thank you for on this. Can you fix the CI.

Hi @kumarUjjawal it was a typo that is now fixed and re-submitted, so could you kindly pls re-run the CI checks? thnx

@pchintar
Copy link
Copy Markdown
Author

pchintar commented Jun 5, 2026

Hi @kumarUjjawal it was a typo that is now fixed and re-submitted, so could you kindly pls re-run the CI checks? thnx

it's already running, so ignore this

@pchintar pchintar force-pushed the fix-sliding-covar-corr branch from 7090ba9 to 71b5da1 Compare June 5, 2026 04:20
@pchintar
Copy link
Copy Markdown
Author

pchintar commented Jun 5, 2026

Hi @kumarUjjawal, sorry for the repeated rerun requests.

Previously I had only verified the output in datafusion-cli, where the results are displayed as 0.0, 25.0, 50.0, etc. The CI failure turned out to be due to the SQL logic test expectations, which represent the same values as 0, 25, 50, etc. This time I also verified the updated test case locally with an isolated sqllogictest run. Thanks.

@kumarUjjawal
Copy link
Copy Markdown
Contributor

Hi @kumarUjjawal, sorry for the repeated rerun requests.

No Worries!

@pchintar
Copy link
Copy Markdown
Author

pchintar commented Jun 5, 2026

@kumarUjjawal all checks pass now, thnx!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both tests pass the same column twice (covar_pop(column2, column2)). Covariance of a column with itself is just variance, and correlation of a column with itself is always 1 so the "two different columns" math is never really tested. Please use two distinct columns.

Across both tests, a row actually leaves the window only once (first test, last row). In the second test the window only ever grows, so nothing is ever removed.

There are no NULLs and no case where the window becomes empty in the middle of the data

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let new_count = self.count - 1;
let delta1 = self.mean1 - value1;
let new_mean1 = delta1 / new_count as f64 + self.mean1;
let delta2 = self.mean2 - value2;
let new_mean2 = delta2 / new_count as f64 + self.mean2;

When the window is holding a single row and that row leaves, the count drops to 0 and the division produces NaN. The internal running values then stay NaN forever, so every later window result silently comes out as NaN instead of the right number.

This is reachable with NULL gaps, e.g. a 2-row sliding window over 10.0, NULL, NULL, 30.0, 40.0, 50.0 — once the window slides onto the NULL section it empties, and all results after that are NaN.

My suggestion would be when the count is about to reach 0, reset the state back to its initial values (count 0, running values 0.0) and skip the division.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if mean1.is_nan() && mean2.is_nan() {
return Ok(ScalarValue::Float64(Some(f64::NAN)));
}

Same concern as in covariance.rs. There's a check in the result computation that treats "both internal averages are NaN" as meaning "the input contained NaN values":

An emptied sliding window also makes the averages NaN, so this check gets falsely triggered and corr returns NaN even for an empty window, where it should return NULL.

@kumarUjjawal
Copy link
Copy Markdown
Contributor

the correlation accumulator internally uses two standard-deviation trackers, and their row-removal logic has the same divide-by-zero-into-NaN problem:

let new_count = self.count - 1;
let delta1 = self.mean - value;
let new_mean = delta1 / new_count as f64 + self.mean;

corr would still return NaN through this path. The same reset-on-empty fix is needed here too.

@kumarUjjawal
Copy link
Copy Markdown
Contributor

@pchintar left some comments let me know what you think.

@pchintar pchintar force-pushed the fix-sliding-covar-corr branch from 71b5da1 to a7aeaf0 Compare June 5, 2026 07:32
@pchintar
Copy link
Copy Markdown
Author

pchintar commented Jun 5, 2026

@pchintar left some comments let me know what you think.

@kumarUjjawal Thanks for pointing this out. I updated the retract logic in both the covariance and variance accumulators to explicitly reset their internal state when the final valid row is removed from the sliding window.

I also expanded the regression coverage to exercise the specific edge cases involved here:

  • row removals that transition the window to an empty state,
  • accumulation after a reset,
  • NULL-gap scenarios where valid rows leave and later re-enter the frame,
  • rows where only one covariance input is NULL (invalid pairs should be ignored),
  • direct variance/stddev coverage for the same retract-to-empty path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sliding window execution fails for covar_pop, covar_samp, and corr

2 participants