Skip to content

[AURON #2155] Date-part extraction functions missing timezone handling for Timestamp inputs #2156

Open
ShreyeshArangath wants to merge 6 commits intoapache:masterfrom
ShreyeshArangath:bug/ts-aware
Open

[AURON #2155] Date-part extraction functions missing timezone handling for Timestamp inputs #2156
ShreyeshArangath wants to merge 6 commits intoapache:masterfrom
ShreyeshArangath:bug/ts-aware

Conversation

@ShreyeshArangath
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #2155

Rationale for this change

Five date-part extraction functions in NativeConverters.scala use buildExtScalarFunction, which does not pass the session timezone to the native Rust implementation:

By contrast, Hour, Minute, Second, and WeekOfYear correctly use buildTimePartExt, which passes sessionLocalTimeZone for TimestampType inputs.

This inconsistency can cause incorrect results for timestamp inputs near date boundaries in non-UTC timezones.

Affected functions:

  • Year (Spark_Year) — not timezone-aware
  • Month (Spark_Month) — not timezone-aware
  • DayOfMonth (Spark_Day) — not timezone-aware
  • DayOfWeek (Spark_DayOfWeek) — not timezone-aware
  • Quarter (Spark_Quarter) — not timezone-aware

What changes are included in this PR?

This PR fixes the bug described above

Are there any user-facing changes?

Correctness issues fixed

How was this patch tested?

Unit tests

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes incorrect date-part extraction for Timestamp inputs in non-UTC session timezones by ensuring the session timezone is passed from Spark to the native Rust implementations and applied before extracting date components.

Changes:

  • Switch Spark expression conversion for year/month/dayofmonth/dayofweek/quarter to use buildTimePartExt, which passes SQLConf.sessionLocalTimeZone for TimestampType.
  • Update native Rust implementations (Spark_Year, Spark_Month, Spark_Day, Spark_DayOfWeek, Spark_Quarter) to interpret timestamp inputs in the provided timezone by converting to a local Date32 prior to extraction.
  • Add Spark-level and Rust-level unit tests to cover timezone-sensitive boundary cases and ensure date inputs remain timezone-invariant.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
spark-extension/src/main/scala/org/apache/spark/sql/auron/NativeConverters.scala Routes more date-part expressions through the timezone-aware ext-function builder so session timezone is provided for timestamps.
spark-extension-shims-spark/src/test/scala/org/apache/auron/AuronFunctionSuite.scala Adds Spark integration tests intended to validate correct behavior under non-UTC timezones.
native-engine/datafusion-ext-functions/src/spark_dates.rs Implements timezone-aware timestamp→local-date conversion for date-part extraction and adds native unit tests for boundary-crossing scenarios.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Remove unused chrono::prelude::* import in spark_dates.rs
- Fix timezone test to insert under UTC and query under America/New_York
  so the test actually exercises the boundary-crossing bug
The previous commit removed chrono::prelude::* but the Offset trait
is needed for calling .fix() on TzOffset.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Date-part extraction functions missing timezone handling for Timestamp inputs

2 participants