Skip to content

Commit 6bc86ef

Browse files
API: rename mode.nan_is_na option to future.distinguish_nan_and_na
1 parent fb517ba commit 6bc86ef

File tree

6 files changed

+60
-25
lines changed

6 files changed

+60
-25
lines changed

doc/source/whatsnew/v3.0.0.rst

Lines changed: 44 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -547,29 +547,55 @@ small behavior differences as collateral:
547547
Changed treatment of NaN values in pyarrow and numpy-nullable floating dtypes
548548
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
549549

550-
Previously, when dealing with a nullable dtype (e.g. ``Float64Dtype`` or ``int64[pyarrow]``), ``NaN`` was treated as interchangeable with :class:`NA` in some circumstances but not others. This was done to make adoption easier, but caused some confusion (:issue:`32265`). In 3.0, an option ``"mode.nan_is_na"`` (default ``True``) controls whether to treat ``NaN`` as equivalent to :class:`NA`.
550+
Previously, when dealing with a nullable dtype (e.g. ``Float64Dtype`` or ``int64[pyarrow]``),
551+
``NaN`` was treated as interchangeable with :class:`NA` in some circumstances but not others.
552+
This was done to make adoption easier, but caused some confusion (:issue:`32265`).
553+
In 3.0, this behaviour is made consistent to by default treat ``NaN`` as equivalent
554+
to :class:`NA` in all cases.
551555

552-
With ``pd.set_option("mode.nan_is_na", True)`` (again, this is the default), ``NaN`` can be passed to constructors, ``__setitem__``, ``__contains__`` and be treated the same as :class:`NA`. The only change users will see is that arithmetic and ``np.ufunc`` operations that previously introduced ``NaN`` entries produce :class:`NA` entries instead:
556+
By default, ``NaN`` can be passed to constructors, ``__setitem__``, ``__contains__``
557+
and will be treated the same as :class:`NA`. The only change users will see is
558+
that arithmetic and ``np.ufunc`` operations that previously introduced ``NaN``
559+
entries produce :class:`NA` entries instead.
553560

554561
*Old behavior:*
555562

556563
.. code-block:: ipython
557564
558-
In [2]: ser = pd.Series([0, None], dtype=pd.Float64Dtype())
565+
# NaN in input gets converted to NA
566+
In [1]: ser = pd.Series([0, np.nan], dtype=pd.Float64Dtype())
567+
In [2]: ser
568+
Out[2]:
569+
0 0.0
570+
1 <NA>
571+
dtype: Float64
572+
# NaN produced by arithmetic (0/0) remained NaN
559573
In [3]: ser / 0
560574
Out[3]:
561575
0 NaN
562576
1 <NA>
563577
dtype: Float64
578+
# the NaN value is not considered as missing
579+
In [4]: (ser / 0).isna()
580+
Out[4]:
581+
0 False
582+
1 True
583+
dtype: bool
564584
565585
*New behavior:*
566586

567587
.. ipython:: python
568588
569-
ser = pd.Series([0, None], dtype=pd.Float64Dtype())
589+
ser = pd.Series([0, np.nan], dtype=pd.Float64Dtype())
590+
ser
570591
ser / 0
592+
(ser / 0).isna()
571593
572-
By contrast, with ``pd.set_option("mode.nan_is_na", False)``, ``NaN`` is always considered distinct and specifically as a floating-point value, so cannot be used with integer dtypes:
594+
In the future, the intention is to consider ``NaN`` and :class:`NA` as distinct
595+
values, and an option to control this behaviour is added in 3.0 through
596+
``pd.options.future.distinguish_nan_and_na``. When enabled, ``NaN`` is always
597+
considered distinct and specifically as a floating-point value. As a consequence,
598+
it cannot be used with integer dtypes.
573599

574600
*Old behavior:*
575601

@@ -583,13 +609,21 @@ By contrast, with ``pd.set_option("mode.nan_is_na", False)``, ``NaN`` is always
583609

584610
.. ipython:: python
585611
586-
pd.set_option("mode.nan_is_na", False)
587-
ser = pd.Series([1, np.nan], dtype=pd.Float64Dtype())
588-
ser[1]
612+
with pd.option_context("future.distinguish_nan_and_na", True):
613+
ser = pd.Series([1, np.nan], dtype=pd.Float64Dtype())
614+
print(ser[1])
615+
616+
If we had passed ``pd.Int64Dtype()`` or ``"int64[pyarrow]"`` for the dtype in
617+
the latter example, this would raise, as a float ``NaN`` cannot be held by an
618+
integer dtype.
589619

590-
If we had passed ``pd.Int64Dtype()`` or ``"int64[pyarrow]"`` for the dtype in the latter example, this would raise, as a float ``NaN`` cannot be held by an integer dtype.
620+
With ``"future.distinguish_nan_and_na"`` enabled, ``ser.to_numpy()`` (and
621+
``frame.values`` and ``np.asarray(obj)``) will convert to ``object`` dtype if
622+
:class:`NA` entries are present, where before they would coerce to
623+
``NaN``. To retain a float numpy dtype, explicitly pass ``na_value=np.nan``
624+
to :meth:`Series.to_numpy`.
591625

592-
With ``"mode.nan_is_na"`` set to ``False``, ``ser.to_numpy()`` (and ``frame.values`` and ``np.asarray(obj)``) will convert to ``object`` dtype if :class:`NA` entries are present, where before they would coerce to ``NaN``. To retain a float numpy dtype, explicitly pass ``na_value=np.nan`` to :meth:`Series.to_numpy`.
626+
Note that the option is experimental and subject to change in future releases.
593627

594628
The ``__module__`` attribute now points to public modules
595629
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

pandas/_config/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,5 +36,5 @@ def using_string_dtype() -> bool:
3636

3737

3838
def is_nan_na() -> bool:
39-
_mode_options = _global_config["mode"]
40-
return _mode_options["nan_is_na"]
39+
_mode_options = _global_config["future"]
40+
return not _mode_options["distinguish_nan_and_na"]

pandas/conftest.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2127,5 +2127,5 @@ def monkeysession():
21272127
@pytest.fixture(params=[True, False])
21282128
def using_nan_is_na(request):
21292129
opt = request.param
2130-
with pd.option_context("mode.nan_is_na", opt):
2130+
with pd.option_context("future.distinguish_nan_and_na", not opt):
21312131
yield opt

pandas/core/config_init.py

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -429,15 +429,6 @@ def is_terminal() -> bool:
429429
validator=is_one_of_factory([True, False, "warn"]),
430430
)
431431

432-
cf.register_option(
433-
"nan_is_na",
434-
os.environ.get("PANDAS_NAN_IS_NA", "1") == "1",
435-
"Whether to treat NaN entries as interchangeable with pd.NA in "
436-
"numpy-nullable and pyarrow float dtypes. See discussion in "
437-
"https://github.com/pandas-dev/pandas/issues/32265",
438-
validator=is_one_of_factory([True, False]),
439-
)
440-
441432

442433
# user warnings
443434
chained_assignment = """
@@ -900,5 +891,15 @@ def register_converter_cb(key: str) -> None:
900891
validator=is_one_of_factory([True, False]),
901892
)
902893

894+
cf.register_option(
895+
"distinguish_nan_and_na",
896+
os.environ.get("PANDAS_FUTURE_DISTINGUISH_NAN_AND_NA", "0") == "1",
897+
"Whether to treat NaN entries as interchangeable with pd.NA in "
898+
"numpy-nullable and pyarrow float dtypes. See discussion in "
899+
"https://github.com/pandas-dev/pandas/issues/32265",
900+
validator=is_one_of_factory([True, False]),
901+
)
902+
903+
903904
# GH#59502
904905
cf.deprecate_option("future.no_silent_downcasting", Pandas4Warning)

pandas/io/json/_json.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -997,7 +997,7 @@ def _read_ujson(self) -> DataFrame | Series:
997997
else:
998998
obj = self._get_object_parser(self.data)
999999
if self.dtype_backend is not lib.no_default:
1000-
with option_context("mode.nan_is_na", True):
1000+
with option_context("future.distinguish_nan_and_na", False):
10011001
return obj.convert_dtypes(
10021002
infer_objects=False, dtype_backend=self.dtype_backend
10031003
)
@@ -1075,7 +1075,7 @@ def __next__(self) -> DataFrame | Series:
10751075
raise ex
10761076

10771077
if self.dtype_backend is not lib.no_default:
1078-
with option_context("mode.nan_is_na", True):
1078+
with option_context("future.distinguish_nan_and_na", False):
10791079
return obj.convert_dtypes(
10801080
infer_objects=False, dtype_backend=self.dtype_backend
10811081
)

pandas/io/json/_table_schema.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -386,7 +386,7 @@ def parse_table_schema(json, precise_float: bool) -> DataFrame:
386386
'table="orient" can not yet read ISO-formatted Timedelta data'
387387
)
388388

389-
with option_context("mode.nan_is_na", True):
389+
with option_context("future.distinguish_nan_and_na", False):
390390
df = df.astype(dtypes)
391391

392392
if "primaryKey" in table["schema"]:

0 commit comments

Comments
 (0)