Skip to content

Commit 787ad72

Browse files
API: rename mode.nan_is_na option to future.distinguish_nan_and_na (#63241)
1 parent 92a97b9 commit 787ad72

File tree

6 files changed

+62
-25
lines changed

6 files changed

+62
-25
lines changed

doc/source/whatsnew/v3.0.0.rst

Lines changed: 44 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -552,29 +552,55 @@ small behavior differences as collateral:
552552
Changed treatment of NaN values in pyarrow and numpy-nullable floating dtypes
553553
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
554554

555-
Previously, when dealing with a nullable dtype (e.g. ``Float64Dtype`` or ``int64[pyarrow]``), ``NaN`` was treated as interchangeable with :class:`NA` in some circumstances but not others. This was done to make adoption easier, but caused some confusion (:issue:`32265`). In 3.0, an option ``"mode.nan_is_na"`` (default ``True``) controls whether to treat ``NaN`` as equivalent to :class:`NA`.
555+
Previously, when dealing with a nullable dtype (e.g. ``Float64Dtype`` or ``int64[pyarrow]``),
556+
``NaN`` was treated as interchangeable with :class:`NA` in some circumstances but not others.
557+
This was done to make adoption easier, but caused some confusion (:issue:`32265`).
558+
In 3.0, this behaviour is made consistent to by default treat ``NaN`` as equivalent
559+
to :class:`NA` in all cases.
556560

557-
With ``pd.set_option("mode.nan_is_na", True)`` (again, this is the default), ``NaN`` can be passed to constructors, ``__setitem__``, ``__contains__`` and be treated the same as :class:`NA`. The only change users will see is that arithmetic and ``np.ufunc`` operations that previously introduced ``NaN`` entries produce :class:`NA` entries instead:
561+
By default, ``NaN`` can be passed to constructors, ``__setitem__``, ``__contains__``
562+
and will be treated the same as :class:`NA`. The only change users will see is
563+
that arithmetic and ``np.ufunc`` operations that previously introduced ``NaN``
564+
entries produce :class:`NA` entries instead.
558565

559566
*Old behavior:*
560567

561568
.. code-block:: ipython
562569
563-
In [2]: ser = pd.Series([0, None], dtype=pd.Float64Dtype())
570+
# NaN in input gets converted to NA
571+
In [1]: ser = pd.Series([0, np.nan], dtype=pd.Float64Dtype())
572+
In [2]: ser
573+
Out[2]:
574+
0 0.0
575+
1 <NA>
576+
dtype: Float64
577+
# NaN produced by arithmetic (0/0) remained NaN
564578
In [3]: ser / 0
565579
Out[3]:
566580
0 NaN
567581
1 <NA>
568582
dtype: Float64
583+
# the NaN value is not considered as missing
584+
In [4]: (ser / 0).isna()
585+
Out[4]:
586+
0 False
587+
1 True
588+
dtype: bool
569589
570590
*New behavior:*
571591

572592
.. ipython:: python
573593
574-
ser = pd.Series([0, None], dtype=pd.Float64Dtype())
594+
ser = pd.Series([0, np.nan], dtype=pd.Float64Dtype())
595+
ser
575596
ser / 0
597+
(ser / 0).isna()
576598
577-
By contrast, with ``pd.set_option("mode.nan_is_na", False)``, ``NaN`` is always considered distinct and specifically as a floating-point value, so cannot be used with integer dtypes:
599+
In the future, the intention is to consider ``NaN`` and :class:`NA` as distinct
600+
values, and an option to control this behaviour is added in 3.0 through
601+
``pd.options.future.distinguish_nan_and_na``. When enabled, ``NaN`` is always
602+
considered distinct and specifically as a floating-point value. As a consequence,
603+
it cannot be used with integer dtypes.
578604

579605
*Old behavior:*
580606

@@ -588,13 +614,21 @@ By contrast, with ``pd.set_option("mode.nan_is_na", False)``, ``NaN`` is always
588614

589615
.. ipython:: python
590616
591-
pd.set_option("mode.nan_is_na", False)
592-
ser = pd.Series([1, np.nan], dtype=pd.Float64Dtype())
593-
ser[1]
617+
with pd.option_context("future.distinguish_nan_and_na", True):
618+
ser = pd.Series([1, np.nan], dtype=pd.Float64Dtype())
619+
print(ser[1])
620+
621+
If we had passed ``pd.Int64Dtype()`` or ``"int64[pyarrow]"`` for the dtype in
622+
the latter example, this would raise, as a float ``NaN`` cannot be held by an
623+
integer dtype.
594624

595-
If we had passed ``pd.Int64Dtype()`` or ``"int64[pyarrow]"`` for the dtype in the latter example, this would raise, as a float ``NaN`` cannot be held by an integer dtype.
625+
With ``"future.distinguish_nan_and_na"`` enabled, ``ser.to_numpy()`` (and
626+
``frame.values`` and ``np.asarray(obj)``) will convert to ``object`` dtype if
627+
:class:`NA` entries are present, where before they would coerce to
628+
``NaN``. To retain a float numpy dtype, explicitly pass ``na_value=np.nan``
629+
to :meth:`Series.to_numpy`.
596630

597-
With ``"mode.nan_is_na"`` set to ``False``, ``ser.to_numpy()`` (and ``frame.values`` and ``np.asarray(obj)``) will convert to ``object`` dtype if :class:`NA` entries are present, where before they would coerce to ``NaN``. To retain a float numpy dtype, explicitly pass ``na_value=np.nan`` to :meth:`Series.to_numpy`.
631+
Note that the option is experimental and subject to change in future releases.
598632

599633
The ``__module__`` attribute now points to public modules
600634
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

pandas/_config/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,5 +36,5 @@ def using_string_dtype() -> bool:
3636

3737

3838
def is_nan_na() -> bool:
39-
_mode_options = _global_config["mode"]
40-
return _mode_options["nan_is_na"]
39+
_mode_options = _global_config["future"]
40+
return not _mode_options["distinguish_nan_and_na"]

pandas/conftest.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2127,5 +2127,5 @@ def monkeysession():
21272127
@pytest.fixture(params=[True, False])
21282128
def using_nan_is_na(request):
21292129
opt = request.param
2130-
with pd.option_context("mode.nan_is_na", opt):
2130+
with pd.option_context("future.distinguish_nan_and_na", not opt):
21312131
yield opt

pandas/core/config_init.py

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -428,15 +428,6 @@ def is_terminal() -> bool:
428428
validator=is_one_of_factory([True, False, "warn"]),
429429
)
430430

431-
cf.register_option(
432-
"nan_is_na",
433-
os.environ.get("PANDAS_NAN_IS_NA", "1") == "1",
434-
"Whether to treat NaN entries as interchangeable with pd.NA in "
435-
"numpy-nullable and pyarrow float dtypes. See discussion in "
436-
"https://github.com/pandas-dev/pandas/issues/32265",
437-
validator=is_one_of_factory([True, False]),
438-
)
439-
440431

441432
# user warnings
442433
chained_assignment = """
@@ -899,6 +890,18 @@ def register_converter_cb(key: str) -> None:
899890
validator=is_one_of_factory([True, False]),
900891
)
901892

893+
cf.register_option(
894+
"distinguish_nan_and_na",
895+
os.environ.get("PANDAS_FUTURE_DISTINGUISH_NAN_AND_NA", "0") == "1",
896+
"Whether to treat NaN entries as distinct from pd.NA in "
897+
"numpy-nullable and pyarrow float dtypes. By default treats both "
898+
"interchangeable as missing values (NaN will be coerced to NA). "
899+
"See discussion in "
900+
"https://github.com/pandas-dev/pandas/issues/32265",
901+
validator=is_one_of_factory([True, False]),
902+
)
903+
904+
902905
# GH#59502
903906
cf.deprecate_option("future.no_silent_downcasting", Pandas4Warning)
904907
cf.deprecate_option(

pandas/io/json/_json.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -997,7 +997,7 @@ def _read_ujson(self) -> DataFrame | Series:
997997
else:
998998
obj = self._get_object_parser(self.data)
999999
if self.dtype_backend is not lib.no_default:
1000-
with option_context("mode.nan_is_na", True):
1000+
with option_context("future.distinguish_nan_and_na", False):
10011001
return obj.convert_dtypes(
10021002
infer_objects=False, dtype_backend=self.dtype_backend
10031003
)
@@ -1075,7 +1075,7 @@ def __next__(self) -> DataFrame | Series:
10751075
raise ex
10761076

10771077
if self.dtype_backend is not lib.no_default:
1078-
with option_context("mode.nan_is_na", True):
1078+
with option_context("future.distinguish_nan_and_na", False):
10791079
return obj.convert_dtypes(
10801080
infer_objects=False, dtype_backend=self.dtype_backend
10811081
)

pandas/io/json/_table_schema.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -386,7 +386,7 @@ def parse_table_schema(json, precise_float: bool) -> DataFrame:
386386
'table="orient" can not yet read ISO-formatted Timedelta data'
387387
)
388388

389-
with option_context("mode.nan_is_na", True):
389+
with option_context("future.distinguish_nan_and_na", False):
390390
df = df.astype(dtypes)
391391

392392
if "primaryKey" in table["schema"]:

0 commit comments

Comments
 (0)