Skip to content

Commit 6430b46

Browse files
authored
Merge branch 'main' into enh_python_scalars
2 parents 7eb94ff + 8813faf commit 6430b46

File tree

252 files changed

+3666
-1968
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

252 files changed

+3666
-1968
lines changed

.pre-commit-config.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ ci:
1919
skip: [pyright, mypy]
2020
repos:
2121
- repo: https://github.com/astral-sh/ruff-pre-commit
22-
rev: v0.14.3
22+
rev: v0.14.7
2323
hooks:
2424
- id: ruff
2525
args: [--exit-non-zero-on-fix]
@@ -71,7 +71,7 @@ repos:
7171
hooks:
7272
- id: isort
7373
- repo: https://github.com/asottile/pyupgrade
74-
rev: v3.21.0
74+
rev: v3.21.2
7575
hooks:
7676
- id: pyupgrade
7777
args: [--py311-plus]
@@ -87,12 +87,12 @@ repos:
8787
types: [text] # overwrite types: [rst]
8888
types_or: [python, rst]
8989
- repo: https://github.com/sphinx-contrib/sphinx-lint
90-
rev: v1.0.1
90+
rev: v1.0.2
9191
hooks:
9292
- id: sphinx-lint
9393
args: ["--enable", "all", "--disable", "line-too-long"]
9494
- repo: https://github.com/pre-commit/mirrors-clang-format
95-
rev: v21.1.2
95+
rev: v21.1.6
9696
hooks:
9797
- id: clang-format
9898
files: ^pandas/_libs/src|^pandas/_libs/include

ci/code_checks.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
7272
-i "pandas.Series.dt PR01" `# Accessors are implemented as classes, but we do not document the Parameters section` \
7373
-i "pandas.Period.freq GL08" \
7474
-i "pandas.Period.ordinal GL08" \
75+
-i "pandas.errors.ChainedAssignmentError SA01" \
7576
-i "pandas.errors.IncompatibleFrequency SA01,SS06,EX01" \
7677
-i "pandas.api.extensions.ExtensionArray.value_counts EX01,RT03,SA01" \
7778
-i "pandas.api.typing.DataFrameGroupBy.plot PR02" \

doc/source/whatsnew/v3.0.0.rst

Lines changed: 64 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,9 @@ process in more detail.
117117

118118
`PDEP-7: Consistent copy/view semantics in pandas with Copy-on-Write <https://pandas.pydata.org/pdeps/0007-copy-on-write.html>`__
119119

120+
Setting the option ``mode.copy_on_write`` no longer has any impact. The option is deprecated
121+
and will be removed in pandas 4.0.
122+
120123
.. _whatsnew_300.enhancements.col:
121124

122125
``pd.col`` syntax can now be used in :meth:`DataFrame.assign` and :meth:`DataFrame.loc`
@@ -233,6 +236,9 @@ Other enhancements
233236
- Support reading Stata 102-format (Stata 1) dta files (:issue:`58978`)
234237
- Support reading Stata 110-format (Stata 7) dta files (:issue:`47176`)
235238
- Switched wheel upload to **PyPI Trusted Publishing** (OIDC) for release-tag pushes in ``wheels.yml``. (:issue:`61718`)
239+
- Added a new :meth:`DataFrame.from_arrow` method to import any Arrow-compatible
240+
tabular data object into a pandas :class:`DataFrame` through the
241+
`Arrow PyCapsule Protocol <https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html>`__ (:issue:`59631`)
236242

237243
.. ---------------------------------------------------------------------------
238244
.. _whatsnew_300.notable_bug_fixes:
@@ -378,6 +384,8 @@ In cases with mixed-resolution inputs, the highest resolution is used:
378384
379385
.. warning:: Many users will now get "M8[us]" dtype data in cases when they used to get "M8[ns]". For most use cases they should not notice a difference. One big exception is converting to integers, which will give integers 1000x smaller.
380386

387+
Similarly, the :class:`Timedelta` constructor and :func:`to_timedelta` with a string input now defaults to a microsecond unit, using nanosecond unit only in cases that actually have nanosecond precision.
388+
381389
.. _whatsnew_300.api_breaking.concat_datetime_sorting:
382390

383391
:func:`concat` no longer ignores ``sort`` when all objects have a :class:`DatetimeIndex`
@@ -544,29 +552,55 @@ small behavior differences as collateral:
544552
Changed treatment of NaN values in pyarrow and numpy-nullable floating dtypes
545553
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
546554

547-
Previously, when dealing with a nullable dtype (e.g. ``Float64Dtype`` or ``int64[pyarrow]``), ``NaN`` was treated as interchangeable with :class:`NA` in some circumstances but not others. This was done to make adoption easier, but caused some confusion (:issue:`32265`). In 3.0, an option ``"mode.nan_is_na"`` (default ``True``) controls whether to treat ``NaN`` as equivalent to :class:`NA`.
555+
Previously, when dealing with a nullable dtype (e.g. ``Float64Dtype`` or ``int64[pyarrow]``),
556+
``NaN`` was treated as interchangeable with :class:`NA` in some circumstances but not others.
557+
This was done to make adoption easier, but caused some confusion (:issue:`32265`).
558+
In 3.0, this behaviour is made consistent to by default treat ``NaN`` as equivalent
559+
to :class:`NA` in all cases.
548560

549-
With ``pd.set_option("mode.nan_is_na", True)`` (again, this is the default), ``NaN`` can be passed to constructors, ``__setitem__``, ``__contains__`` and be treated the same as :class:`NA`. The only change users will see is that arithmetic and ``np.ufunc`` operations that previously introduced ``NaN`` entries produce :class:`NA` entries instead:
561+
By default, ``NaN`` can be passed to constructors, ``__setitem__``, ``__contains__``
562+
and will be treated the same as :class:`NA`. The only change users will see is
563+
that arithmetic and ``np.ufunc`` operations that previously introduced ``NaN``
564+
entries produce :class:`NA` entries instead.
550565

551566
*Old behavior:*
552567

553568
.. code-block:: ipython
554569
555-
In [2]: ser = pd.Series([0, None], dtype=pd.Float64Dtype())
570+
# NaN in input gets converted to NA
571+
In [1]: ser = pd.Series([0, np.nan], dtype=pd.Float64Dtype())
572+
In [2]: ser
573+
Out[2]:
574+
0 0.0
575+
1 <NA>
576+
dtype: Float64
577+
# NaN produced by arithmetic (0/0) remained NaN
556578
In [3]: ser / 0
557579
Out[3]:
558580
0 NaN
559581
1 <NA>
560582
dtype: Float64
583+
# the NaN value is not considered as missing
584+
In [4]: (ser / 0).isna()
585+
Out[4]:
586+
0 False
587+
1 True
588+
dtype: bool
561589
562590
*New behavior:*
563591

564592
.. ipython:: python
565593
566-
ser = pd.Series([0, None], dtype=pd.Float64Dtype())
594+
ser = pd.Series([0, np.nan], dtype=pd.Float64Dtype())
595+
ser
567596
ser / 0
597+
(ser / 0).isna()
568598
569-
By contrast, with ``pd.set_option("mode.nan_is_na", False)``, ``NaN`` is always considered distinct and specifically as a floating-point value, so cannot be used with integer dtypes:
599+
In the future, the intention is to consider ``NaN`` and :class:`NA` as distinct
600+
values, and an option to control this behaviour is added in 3.0 through
601+
``pd.options.future.distinguish_nan_and_na``. When enabled, ``NaN`` is always
602+
considered distinct and specifically as a floating-point value. As a consequence,
603+
it cannot be used with integer dtypes.
570604

571605
*Old behavior:*
572606

@@ -580,13 +614,21 @@ By contrast, with ``pd.set_option("mode.nan_is_na", False)``, ``NaN`` is always
580614

581615
.. ipython:: python
582616
583-
pd.set_option("mode.nan_is_na", False)
584-
ser = pd.Series([1, np.nan], dtype=pd.Float64Dtype())
585-
ser[1]
617+
with pd.option_context("future.distinguish_nan_and_na", True):
618+
ser = pd.Series([1, np.nan], dtype=pd.Float64Dtype())
619+
print(ser[1])
620+
621+
If we had passed ``pd.Int64Dtype()`` or ``"int64[pyarrow]"`` for the dtype in
622+
the latter example, this would raise, as a float ``NaN`` cannot be held by an
623+
integer dtype.
586624

587-
If we had passed ``pd.Int64Dtype()`` or ``"int64[pyarrow]"`` for the dtype in the latter example, this would raise, as a float ``NaN`` cannot be held by an integer dtype.
625+
With ``"future.distinguish_nan_and_na"`` enabled, ``ser.to_numpy()`` (and
626+
``frame.values`` and ``np.asarray(obj)``) will convert to ``object`` dtype if
627+
:class:`NA` entries are present, where before they would coerce to
628+
``NaN``. To retain a float numpy dtype, explicitly pass ``na_value=np.nan``
629+
to :meth:`Series.to_numpy`.
588630

589-
With ``"mode.nan_is_na"`` set to ``False``, ``ser.to_numpy()`` (and ``frame.values`` and ``np.asarray(obj)``) will convert to ``object`` dtype if :class:`NA` entries are present, where before they would coerce to ``NaN``. To retain a float numpy dtype, explicitly pass ``na_value=np.nan`` to :meth:`Series.to_numpy`.
631+
Note that the option is experimental and subject to change in future releases.
590632

591633
The ``__module__`` attribute now points to public modules
592634
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -745,10 +787,16 @@ Other API changes
745787
the dtype of the resulting Index (:issue:`60797`)
746788
- :class:`IncompatibleFrequency` now subclasses ``TypeError`` instead of ``ValueError``. As a result, joins with mismatched frequencies now cast to object like other non-comparable joins, and arithmetic with indexes with mismatched frequencies align (:issue:`55782`)
747789
- :class:`Series` "flex" methods like :meth:`Series.add` no longer allow passing a :class:`DataFrame` for ``other``; use the DataFrame reversed method instead (:issue:`46179`)
790+
- :func:`date_range` and :func:`timedelta_range` no longer default to ``unit="ns"``, instead will infer a unit from the ``start``, ``end``, and ``freq`` parameters. Explicitly specify a desired ``unit`` to override these (:issue:`59031`)
748791
- :meth:`CategoricalIndex.append` no longer attempts to cast different-dtype indexes to the caller's dtype (:issue:`41626`)
749792
- :meth:`ExtensionDtype.construct_array_type` is now a regular method instead of a ``classmethod`` (:issue:`58663`)
793+
- Arithmetic operations between a :class:`Series`, :class:`Index`, or :class:`ExtensionArray` with a ``list`` now consistently wrap that list with an array equivalent to ``Series(my_list).array``. To do any other kind of type inference or casting, do so explicitly before operating (:issue:`62552`)
750794
- Comparison operations between :class:`Index` and :class:`Series` now consistently return :class:`Series` regardless of which object is on the left or right (:issue:`36759`)
751795
- Numpy functions like ``np.isinf`` that return a bool dtype when called on a :class:`Index` object now return a bool-dtype :class:`Index` instead of ``np.ndarray`` (:issue:`52676`)
796+
- Methods that can operate in-place (:meth:`~DataFrame.replace`, :meth:`~DataFrame.fillna`,
797+
:meth:`~DataFrame.ffill`, :meth:`~DataFrame.bfill`, :meth:`~DataFrame.interpolate`,
798+
:meth:`~DataFrame.where`, :meth:`~DataFrame.mask`, :meth:`~DataFrame.clip`) now return
799+
the modified DataFrame or Series (``self``) instead of ``None`` when ``inplace=True`` (:issue:`63207`)
752800

753801
.. ---------------------------------------------------------------------------
754802
.. _whatsnew_300.deprecations:
@@ -1178,9 +1226,11 @@ MultiIndex
11781226
I/O
11791227
^^^
11801228
- Bug in :class:`DataFrame` and :class:`Series` ``repr`` of :py:class:`collections.abc.Mapping` elements. (:issue:`57915`)
1229+
- Bug in :meth:`DataFrame.to_hdf` and :func:`read_hdf` with ``timedelta64`` dtypes with non-nanosecond resolution failing to round-trip correctly (:issue:`63239`)
11811230
- Fix bug in ``on_bad_lines`` callable when returning too many fields: now emits
11821231
``ParserWarning`` and truncates extra fields regardless of ``index_col`` (:issue:`61837`)
11831232
- Bug in :func:`pandas.json_normalize` inconsistently handling non-dict items in ``data`` when ``max_level`` was set. The function will now raise a ``TypeError`` if ``data`` is a list containing non-dict items (:issue:`62829`)
1233+
- Bug in :func:`pandas.json_normalize` raising ``TypeError`` when ``meta`` contained a non-string key (e.g., ``int``) and ``record_path`` was specified, which was inconsistent with the behavior when ``record_path`` was ``None`` (:issue:`63019`)
11841234
- Bug in :meth:`.DataFrame.to_json` when ``"index"`` was a value in the :attr:`DataFrame.column` and :attr:`Index.name` was ``None``. Now, this will fail with a ``ValueError`` (:issue:`58925`)
11851235
- Bug in :meth:`.io.common.is_fsspec_url` not recognizing chained fsspec URLs (:issue:`48978`)
11861236
- Bug in :meth:`DataFrame._repr_html_` which ignored the ``"display.float_format"`` option (:issue:`59876`)
@@ -1234,6 +1284,7 @@ Plotting
12341284
- Bug in :meth:`Series.plot` preventing a line and bar from being aligned on the same plot (:issue:`61161`)
12351285
- Bug in :meth:`Series.plot` preventing a line and scatter plot from being aligned (:issue:`61005`)
12361286
- Bug in :meth:`Series.plot` with ``kind="pie"`` with :class:`ArrowDtype` (:issue:`59192`)
1287+
- Bug in plotting with a :class:`TimedeltaIndex` with non-nanosecond resolution displaying incorrect labels (:issue:`63237`)
12371288

12381289
Groupby/resample/rolling
12391290
^^^^^^^^^^^^^^^^^^^^^^^^
@@ -1264,7 +1315,8 @@ Groupby/resample/rolling
12641315
- Bug in :meth:`Rolling.apply` for ``method="table"`` where column order was not being respected due to the columns getting sorted by default. (:issue:`59666`)
12651316
- Bug in :meth:`Rolling.apply` where the applied function could be called on fewer than ``min_period`` periods if ``method="table"``. (:issue:`58868`)
12661317
- Bug in :meth:`Rolling.sem` computing incorrect results because it divided by ``sqrt((n - 1) * (n - ddof))`` instead of ``sqrt(n * (n - ddof))``. (:issue:`63180`)
1267-
- Bug in :meth:`Rolling.skew` incorrectly computing skewness for windows following outliers due to numerical instability. The calculation now properly handles catastrophic cancellation by recomputing affected windows (:issue:`47461`)
1318+
- Bug in :meth:`Rolling.skew` and in :meth:`Rolling.kurt` incorrectly computing skewness and kurtosis, respectively, for windows following outliers due to numerical instability. The calculation now properly handles catastrophic cancellation by recomputing affected windows (:issue:`47461`, :issue:`61416`)
1319+
- Bug in :meth:`Rolling.skew` and in :meth:`Rolling.kurt` where results varied with input length despite identical data and window contents (:issue:`54380`)
12681320
- Bug in :meth:`Series.resample` could raise when the date range ended shortly before a non-existent time. (:issue:`58380`)
12691321
- Bug in :meth:`Series.resample` raising error when resampling non-nanosecond resolutions out of bounds for nanosecond precision (:issue:`57427`)
12701322
- Bug in :meth:`Series.rolling.var` and :meth:`Series.rolling.std` computing incorrect results due to numerical instability. (:issue:`47721`, :issue:`52407`, :issue:`54518`, :issue:`55343`)
@@ -1302,6 +1354,7 @@ Sparse
13021354
- Bug in :class:`SparseDtype` for equal comparison with na fill value. (:issue:`54770`)
13031355
- Bug in :meth:`DataFrame.sparse.from_spmatrix` which hard coded an invalid ``fill_value`` for certain subtypes. (:issue:`59063`)
13041356
- Bug in :meth:`DataFrame.sparse.to_dense` which ignored subclassing and always returned an instance of :class:`DataFrame` (:issue:`59913`)
1357+
- Bug in :meth:`cumsum` for integer arrays Calling SparseArray.cumsum caused max recursion depth error. (:issue:`62669`)
13051358

13061359
ExtensionArray
13071360
^^^^^^^^^^^^^^

pandas/_config/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,5 +41,5 @@ def using_python_scalars() -> bool:
4141

4242

4343
def is_nan_na() -> bool:
44-
_mode_options = _global_config["mode"]
45-
return _mode_options["nan_is_na"]
44+
_mode_options = _global_config["future"]
45+
return not _mode_options["distinguish_nan_and_na"]

pandas/_libs/hashtable_class_helper.pxi.in

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1070,7 +1070,7 @@ cdef class StringHashTable(HashTable):
10701070
val = values[i]
10711071

10721072
if isinstance(val, str):
1073-
# GH#31499 if we have a np.str_ PyUnicode_AsUTF8 won't recognize
1073+
# GH#31499 if we have an np.str_ PyUnicode_AsUTF8 won't recognize
10741074
# it as a str, even though isinstance does.
10751075
v = PyUnicode_AsUTF8(<str>val)
10761076
else:
@@ -1108,7 +1108,7 @@ cdef class StringHashTable(HashTable):
11081108
val = values[i]
11091109

11101110
if isinstance(val, str):
1111-
# GH#31499 if we have a np.str_ PyUnicode_AsUTF8 won't recognize
1111+
# GH#31499 if we have an np.str_ PyUnicode_AsUTF8 won't recognize
11121112
# it as a str, even though isinstance does.
11131113
v = PyUnicode_AsUTF8(<str>val)
11141114
else:

pandas/_libs/index.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ cdef bint is_definitely_invalid_key(object val):
5858

5959
cdef ndarray _get_bool_indexer(ndarray values, object val, ndarray mask = None):
6060
"""
61-
Return a ndarray[bool] of locations where val matches self.values.
61+
Return an ndarray[bool] of locations where val matches self.values.
6262
6363
If val is not NA, this is equivalent to `self.values == val`
6464
"""

pandas/_libs/internals.pyi

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,3 @@ class BlockValuesRefs:
9494
def add_reference(self, blk: Block) -> None: ...
9595
def add_index_reference(self, index: Index) -> None: ...
9696
def has_reference(self) -> bool: ...
97-
98-
class SetitemMixin:
99-
def __setitem__(self, key, value) -> None: ...
100-
def __delitem__(self, key) -> None: ...

pandas/_libs/internals.pyx

Lines changed: 0 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,6 @@
11
from collections import defaultdict
2-
import sys
3-
import warnings
42

53
cimport cython
6-
from cpython cimport PY_VERSION_HEX
74
from cpython.object cimport PyObject
85
from cpython.pyport cimport PY_SSIZE_T_MAX
96
from cpython.slice cimport PySlice_GetIndicesEx
@@ -23,9 +20,6 @@ from numpy cimport (
2320
cnp.import_array()
2421

2522
from pandas._libs.algos import ensure_int64
26-
from pandas.compat import CHAINED_WARNING_DISABLED
27-
from pandas.errors import ChainedAssignmentError
28-
from pandas.errors.cow import _chained_assignment_msg
2923

3024
from pandas._libs.util cimport (
3125
is_array,
@@ -1002,47 +996,3 @@ cdef class BlockValuesRefs:
1002996
return self._has_reference_maybe_locked()
1003997
ELSE:
1004998
return self._has_reference_maybe_locked()
1005-
1006-
1007-
cdef extern from "Python.h":
1008-
"""
1009-
// python version < 3.14
1010-
#if PY_VERSION_HEX < 0x030E0000
1011-
// This function is unused and is declared to avoid a build warning
1012-
int __Pyx_PyUnstable_Object_IsUniqueReferencedTemporary(PyObject *ref) {
1013-
return Py_REFCNT(ref) == 1;
1014-
}
1015-
#else
1016-
#define __Pyx_PyUnstable_Object_IsUniqueReferencedTemporary \
1017-
PyUnstable_Object_IsUniqueReferencedTemporary
1018-
#endif
1019-
"""
1020-
int PyUnstable_Object_IsUniqueReferencedTemporary\
1021-
"__Pyx_PyUnstable_Object_IsUniqueReferencedTemporary"(object o) except -1
1022-
1023-
1024-
# Python version compatibility for PyUnstable_Object_IsUniqueReferencedTemporary
1025-
cdef inline bint _is_unique_referenced_temporary(object obj) except -1:
1026-
if PY_VERSION_HEX >= 0x030E0000:
1027-
# Python 3.14+ has PyUnstable_Object_IsUniqueReferencedTemporary
1028-
return PyUnstable_Object_IsUniqueReferencedTemporary(obj)
1029-
else:
1030-
# Fallback for older Python versions using sys.getrefcount
1031-
return sys.getrefcount(obj) <= 1
1032-
1033-
1034-
cdef class SetitemMixin:
1035-
# class used in DataFrame and Series for checking for chained assignment
1036-
1037-
def __setitem__(self, key, value) -> None:
1038-
cdef bint is_unique = 0
1039-
if not CHAINED_WARNING_DISABLED:
1040-
is_unique = _is_unique_referenced_temporary(self)
1041-
if is_unique:
1042-
warnings.warn(
1043-
_chained_assignment_msg, ChainedAssignmentError, stacklevel=1
1044-
)
1045-
self._setitem(key, value)
1046-
1047-
def __delitem__(self, key) -> None:
1048-
self._delitem(key)

pandas/_libs/lib.pyx

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,6 @@ from pandas._libs.tslibs.nattype cimport (
106106
)
107107
from pandas._libs.tslibs.offsets cimport is_offset_object
108108
from pandas._libs.tslibs.period cimport is_period_object
109-
from pandas._libs.tslibs.timedeltas cimport convert_to_timedelta64
110109
from pandas._libs.tslibs.timezones cimport tz_compare
111110

112111
# constants that will be compared to potentially arbitrarily large
@@ -2674,11 +2673,6 @@ def maybe_convert_objects(ndarray[object] objects,
26742673
elif is_timedelta(val):
26752674
if convert_non_numeric:
26762675
seen.timedelta_ = True
2677-
try:
2678-
convert_to_timedelta64(val, "ns")
2679-
except OutOfBoundsTimedelta:
2680-
seen.object_ = True
2681-
break
26822676
break
26832677
else:
26842678
seen.object_ = True

pandas/_libs/tslib.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@ def format_array_from_datetime(
120120
NPY_DATETIMEUNIT reso=NPY_FR_ns,
121121
) -> np.ndarray:
122122
"""
123-
return a np object array of the string formatted values
123+
return an np object array of the string formatted values
124124

125125
Parameters
126126
----------

0 commit comments

Comments
 (0)