Skip to content

Conversation

@jakechirsch
Copy link

This PR adds a new threshold argument to to_datetime that allows users to specify the minimum fraction of valid datetime components required for parsing to succeed. For this feature, "successful" parsing means that the function returns either a valid Timestamp or NaT (i.e., it does not raise an exception). The threshold determines whether partially-invalid values produce NaT or raise an error. This enables more flexible and robust parsing behavior for partially-invalid dates while preserving strict behavior by default (threshold=1.0).

Summary of changes

  • Added threshold argument to to_datetime.
  • Implemented validation logic and clamping of threshold values to [0.0, 1.0].
  • Updated parsing internals to compute the fraction of valid components.
  • Added tests for valid, invalid, and boundary threshold behavior.
  • Added documentation: explanation, parameter description, and example.
  • Added type annotations across all new argument signatures.
  • Ensured all code checks and pre-commit hooks pass.

@jakechirsch jakechirsch marked this pull request as draft November 21, 2025 23:47
@jakechirsch jakechirsch force-pushed the main branch 5 times, most recently from f6c3fec to 542851d Compare November 25, 2025 19:56
@jakechirsch jakechirsch marked this pull request as ready for review November 26, 2025 00:34
@jbrockmendel
Copy link
Member

I don't think this is something we want to support.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. This does not seem to address the linked issue which is about having a tolerance based on a percent of all records, but rather implements a tolerance for components within a single record.

I'm negative on this change. I do not see the value on having pandas raise vs give NaT based on the number of invalid components.

@rhshadrach rhshadrach closed this Nov 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ENH: Make pd.to_datetime with format parameter more robust to dirty data

3 participants