Skip to content

[SPARK-40437][SS][PYTHON] Support string representation of durationMs in GroupState.setTimeoutDuration#56178

Open
brijrajk wants to merge 1 commit into
apache:masterfrom
brijrajk:SPARK-40437-groupstate-string-duration
Open

[SPARK-40437][SS][PYTHON] Support string representation of durationMs in GroupState.setTimeoutDuration#56178
brijrajk wants to merge 1 commit into
apache:masterfrom
brijrajk:SPARK-40437-groupstate-string-duration

Conversation

@brijrajk

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

GroupState.setTimeoutDuration previously accepted only an integer milliseconds value. This PR
extends it to also accept a Spark interval string (e.g. "5 minutes", "1 hour 30 minutes",
"1.5 seconds"), matching the behaviour of the Scala API's
GroupStateImpl.setTimeoutDuration(String) overload.

Changes:

  • Added _parse_timeout_duration(duration: str) -> int helper in
    python/pyspark/sql/streaming/state.py that converts a Spark interval string to milliseconds.
    Parsing behaviour mirrors Scala's IntervalUtils.stringToInterval and IntervalUtils.getDuration
    (31 days/month convention for structured streaming watermarks).
  • Updated setTimeoutDuration to accept Union[int, str] and call the helper when a string is
    passed.
  • Added INVALID_TIMEOUT_DURATION_STRING error class to
    python/pyspark/errors/error-conditions.json.
  • Added python/pyspark/sql/tests/streaming/test_state.py with 27 unit tests covering: all
    supported units, months/years (31-day convention), negative component offsets, fractional seconds,
    leading-dot decimals (.5 seconds), explicit +/- signs, whitespace between sign and
    quantity, the interval keyword prefix, compound durations, case-insensitivity, and various
    invalid-input cases.

Why are the changes needed?

The Scala API supports both setTimeoutDuration(long durationMs) and
setTimeoutDuration(String duration). The Python implementation only supported the integer form,
leaving users unable to use human-readable interval strings as described in SPARK-40437.

Does this PR introduce any user-facing change?

Yes. GroupState.setTimeoutDuration now also accepts a Spark interval string such as
"5 minutes" or "1 hour 30 minutes". The integer form continues to work unchanged.
This change is relative to the unreleased master branch.

How was this patch tested?

27 new pure-Python unit tests in python/pyspark/sql/tests/streaming/test_state.py, covering
both positive cases (all units, compound durations, fractional seconds, edge-case signs and
whitespace) and negative cases (invalid strings, non-positive durations, wrong timeout mode).

Tests can be run without a full Spark build:

source .venv/bin/activate
PYTHONPATH=python python3 -m unittest pyspark.sql.tests.streaming.test_state -v

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude (Anthropic)

@brijrajk brijrajk force-pushed the SPARK-40437-groupstate-string-duration branch 3 times, most recently from 1b06aaa to d063818 Compare May 29, 2026 12:35
@brijrajk

Copy link
Copy Markdown
Contributor Author

Could a committer please review this? It extends GroupState.setTimeoutDuration to accept a Spark interval string (e.g. "5 minutes", "1 hour 30 minutes") in addition to integer milliseconds, matching the existing Scala API overload (SPARK-40437).

cc @zhengruifeng @itholic

@zhengruifeng zhengruifeng changed the title [SPARK-40437][PYTHON] Support string representation of durationMs in GroupState.setTimeoutDuration [SPARK-40437][SS][PYTHON] Support string representation of durationMs in GroupState.setTimeoutDuration Jun 3, 2026
@zhengruifeng

Copy link
Copy Markdown
Contributor

I think @HyukjinKwon and @HeartSaVioR should have more context as per the discussion in https://issues.apache.org/jira/browse/SPARK-40437

def setTimeoutDuration(self, durationMs: Union[int, str]) -> None:
"""
Set the timeout duration in ms for this key.
Processing time timeout must be enabled.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we add a versionchanged to doc that str is supported?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, added versionchanged:: 4.3.0 (corrected from 5.0.0 — since this is a normally backported change, the right version is branch-4.x's 4.3.0).

@brijrajk brijrajk force-pushed the SPARK-40437-groupstate-string-duration branch from d063818 to 646a31c Compare June 3, 2026 13:07
@brijrajk brijrajk requested a review from zhengruifeng June 15, 2026 07:31
…GroupState.setTimeoutDuration

Allow `setTimeoutDuration` to accept a Spark interval string (e.g. '5 seconds',
'1 hour 30 minutes') in addition to an integer millisecond value, matching
the Scala-side overload. A Python parser converts supported time units
(weeks, days, hours, minutes, seconds, milliseconds, microseconds) to
milliseconds; month/year units and invalid strings raise PySparkValueError.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@brijrajk brijrajk force-pushed the SPARK-40437-groupstate-string-duration branch from 646a31c to 4fff739 Compare June 22, 2026 17:46
@HyukjinKwon

Copy link
Copy Markdown
Member

cc @HeartSaVioR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants