-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
fix: pd.to_numeric handling of datetime #62649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
67fd271 to
4b706db
Compare
4b706db to
82a0a2d
Compare
| else: | ||
| ints[i] = np.datetime64(val).astype(np.int64) | ||
| # because of pd.NaT, we may need to return in floats #GH 42380 | ||
| floats[i] = <float64_t>ints[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely don't want to cast to float here.
This issue is really challenging and probably impossible to solve using the default (NumPy-backed) type system of pandas. I'm not sure there's anything to do unless that changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillAyd, Casting to float in addition to other dtypes is the existing pattern for handling null values in maybe_convert_numeric though.. The float array will be returned if there is null values in the series.
e.g.,
Line 2411 in 82a0a2d
| floats[i] = complexes[i] = val |
Line 2429 in 82a0a2d
| floats[i] = uints[i] = ints[i] = bools[i] = val |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference is that those all represent numeric operations, and having a precision loss may be an acceptable trade-off. I'm not sure that there's value in a precision loss with this operation; its not like a user would apply a mathematical algorithm to the result, so I would expect that retaining an accurate time representation is important
|
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
|
Thanks for the PR but this PR has gone stale so closing |
This PR handles all the cases of inconsistent datetime related
pd.to_numerichandling from Issue 42380 and is compatible withpandas/io/parsers/base_parser.pypandas/_libs/lib.pyxcontains the core/common logic for datetime relatedpd.to_numerichandlingpandas/pandas/_libs/lib.pyx
Line 2453 in 083f01a
pd.Timedeltawill be returned as the direct value, andpd.Timestampwith Timezones are handled with the default np.datetime64 UTC conversion.The removal of
elif lib.is_np_dtype(values_dtype, "mM"): values = values.view(np.int64)inpandas/core/tools/numeric.pyaddresses the bug where int value of NaT are returned (Issue 42380).Introduces
convert_datetimeflag inmaybe_convert_numeric, as this function is used by bothpd.to_numericand in the base_parsers. The behavior forpd.to_numericon datetimes is to convert, while the behavior on base_parsers is to not convert.pandas/pandas/_libs/lib.pyi
Line 127 in 083f01a
pd.to_numerichas an inconsistent behavior fordatetimeobjects #43280pd.to_numerichas an inconsistent behavior fordatetimeobjects #43280 and related linked PR discussionsdoc/source/whatsnew/v3.0.0.rstfile