Skip to content

Conversation

@gaogaotiantian
Copy link
Contributor

What changes were proposed in this pull request?

A new mode controlled by a SQLConf - "spark.sql.session.enforceTimeZoneMatch" is introduced to enforce timezone check when converting timestamps.

Under this mode, only timezone aware datetime() can be converted from/to TimestampType() and only naive datetime() can be converted from/to TimestampNTZType().

To make this work in UDF workers where SQLConf does not exist, a new class variable is introduced in DatetimeType as the fallback config. We set this class variable when we instantiate a worker to control the behavior.

The current implementation is a PoC. Once the direction is approved, I'll fill the gaps.

TODO:

  • Other Python runners besides vanilla UDF
  • Better exception error class/message
  • Tests
  • Documentation

Why are the changes needed?

We have too many timezone related issues now. It's not even possible to define how timestamps should work in spark. Python has rules about naive timestamps which use the local machine timezone, which makes UDF workers super unpredictable. Spark also has a session local timezone config which makes the situation even more complicated.

The only way to make it explanable and consistent is to never mix timezone-aware and timezone-naive timestamps. If the user just want a timestamp without a timezone, they need to use TimestampNTZType(), period.

Does this PR introduce any user-facing change?

This PR is backward compatible. It introduces a new config to change the behavior.

How was this patch tested?

For now, locally tested the an error would be raised. Tests should be written in the future.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions
Copy link

github-actions bot commented Jan 3, 2026

JIRA Issue Information

=== Improvement SPARK-54890 ===
Summary: Allow user to enforce timezone match in conversion
Assignee: None
Status: Open
Affected: ["4.2.0"]


This comment was automatically generated by GitHub Actions

@gaogaotiantian
Copy link
Contributor Author

@HyukjinKwon and @cloud-fan could you take a look at this? We have too many timezone related issues and I think we should provide a way to solve it for good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant