Skip to content

feat: support LEAD and LAG window functions with IGNORE NULLS#3876

Open
viirya wants to merge 7 commits intoapache:mainfrom
viirya:feat-window-ignore-nulls
Open

feat: support LEAD and LAG window functions with IGNORE NULLS#3876
viirya wants to merge 7 commits intoapache:mainfrom
viirya:feat-window-ignore-nulls

Conversation

@viirya
Copy link
Copy Markdown
Member

@viirya viirya commented Apr 1, 2026

Which issue does this PR close?

Closes #.

Rationale for this change

Support Lead and Lag window function with its ignoreNulls flag.

What changes are included in this PR?

  • Add ignore_nulls field to WindowExpr proto message
  • Serialize Lag window function with its ignoreNulls flag in CometWindowExec
  • Handle Lead case in windowExprToProto alongside Lag
  • Extend find_df_window_function to also look up WindowUDFs (not just AggregateUDFs)
  • Pass ignore_nulls to DataFusion's create_window_expr
  • Enable previously-ignored LAG tests
  • Add tests for LAG with IGNORE NULLS and LEAD with IGNORE NULLS

How are these changes tested?

- Add ignore_nulls field to WindowExpr proto message
- Serialize Lag window function with its ignoreNulls flag in CometWindowExec
- Extend find_df_window_function to also look up WindowUDFs (not just AggregateUDFs)
- Pass ignore_nulls to DataFusion's create_window_expr
- Enable previously-ignored LAG tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@viirya viirya marked this pull request as draft April 1, 2026 19:16
@viirya viirya marked this pull request as draft April 1, 2026 19:16
viirya and others added 3 commits April 1, 2026 12:21
ORDER BY b alone has ties, causing Spark and DataFusion to produce
different but both-valid row orderings. Add c as a secondary sort key
so tie-breaking is deterministic and results are comparable.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CometWindowExec is marked Incompatible by default. Add
allowIncompatible=true config so LAG tests actually run via Comet
and checkSparkAnswerAndOperator can verify native execution.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
LAG/LEAD (FrameLessOffsetWindowFunction) support arbitrary partition
and order specs. The existing validatePartitionAndSortSpecsForWindowFunc
check (which requires partition columns == order columns) is only needed
for aggregate window functions, not offset functions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@viirya viirya marked this pull request as ready for review April 2, 2026 06:27
spark.read.parquet(dir.toString).createOrReplaceTempView("window_test")
val df = sql("""
SELECT a, b, c,
LAG(c) OVER (PARTITION BY a ORDER BY b, c) as lag_c
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this testing IGNORE NULL?

Copy link
Copy Markdown
Member Author

@viirya viirya Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, missed it. Added another test with IGNORE NULLs now.

} else {
(None, exprToProto(windowExpr.windowFunction, output))
windowExpr.windowFunction match {
case lag: Lag =>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Lead also be handled the same way?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Since you asked, I just added Lead in this PR too.

}

if (op.partitionSpec.nonEmpty && op.orderSpec.nonEmpty &&
val hasOnlyOffsetFunctions = winExprs.nonEmpty &&
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment explaining the logic here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, added a comment.

viirya and others added 2 commits April 2, 2026 11:07
- Handle Lead case in windowExprToProto alongside Lag
- Add comment explaining hasOnlyOffsetFunctions guard
- Add test for LAG IGNORE NULLS
- Enable LEAD tests with allowIncompatible config and deterministic ORDER BY

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@viirya viirya changed the title feat: support LAG window function with IGNORE NULLS feat: support LEAD and LAG window functions with IGNORE NULLS Apr 2, 2026
FROM window_test
""")
checkSparkAnswerAndOperator(df)
test("window: LAG with offset 2 and default value") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nice, the PR also fixes this test although it is not related to IGNORE NULLS

}

if (op.partitionSpec.nonEmpty && op.orderSpec.nonEmpty &&
// Offset window functions (LAG, LEAD) support arbitrary partition and order specs, so skip
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if FIRST_VALUE, LAST_VALUE, NTH are also offset window function, cause they also access the data within frame by some offset (FiRST_VALUE by 1, etc) ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, in Spark

Lag / Lead → inherit FrameLessOffsetWindowFunction
NthValue → inherit AggregateWindowFunction with OffsetWindowFunction
First / Last → inherit DeclarativeAggregate

Only FrameLessOffsetWindowFunction doesn't require frame, currently in Spark only Lag / Lead are FrameLessOffsetWindowFunction.

Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good to me, thanks @viirya may I ask you to add sql tests like in #3891

@viirya
Copy link
Copy Markdown
Member Author

viirya commented Apr 3, 2026

The PR looks good to me, thanks @viirya may I ask you to add sql tests like in #3891

Thanks for review. I will try to add sql tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @viirya it is LGTM

@viirya
Copy link
Copy Markdown
Member Author

viirya commented Apr 3, 2026

Thank you @comphead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants