Skip to content

Comments

fix: support datetime variables in Dataset.interp#11081

Open
emmanuel-ferdman wants to merge 3 commits intopydata:mainfrom
emmanuel-ferdman:main
Open

fix: support datetime variables in Dataset.interp#11081
emmanuel-ferdman wants to merge 3 commits intopydata:mainfrom
emmanuel-ferdman:main

Conversation

@emmanuel-ferdman
Copy link

PR Summary

Dataset.interp() silently dropped datetime64 and timedelta64 variables. Now they are interpolated by converting to float64 and back, with NaT handled like NaN.

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @emmanuel-ferdman—this largely looks good.

Could you add "Mm" to the valid dtypes on the DataArray.interp side as well? There is no need now to raise in that circumstance, which is great.

One edge case to consider is what to do in the scenario that extrapolation leads to float values outside the range that can be represented by 64-bit integers. Maybe we punt on that for now though, since it could be messy to handle in a robust way.

Comment on lines +3945 to +3959
var_indexers = {k: v for k, v in use_indexers.items() if k in var.dims}
int_data = var.data.view(np.int64)
nat = np.iinfo(np.int64).min
as_float = np.where(
int_data == nat, np.nan, int_data.astype(np.float64)
)
result = missing.interp(
var.copy(data=as_float), var_indexers, method, **kwargs
)
as_int = np.where(
np.isnan(result.data),
nat,
np.round(np.nan_to_num(result.data)).astype(np.int64),
)
variables[name] = result.copy(data=as_int.view(var.dtype))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the following, by sticking completely to functions / methods defined for xarray objects, should be robust to all chunked array implementations, not just dask (computation comes from from xarray.computation import computation):

Suggested change
var_indexers = {k: v for k, v in use_indexers.items() if k in var.dims}
int_data = var.data.view(np.int64)
nat = np.iinfo(np.int64).min
as_float = np.where(
int_data == nat, np.nan, int_data.astype(np.float64)
)
result = missing.interp(
var.copy(data=as_float), var_indexers, method, **kwargs
)
as_int = np.where(
np.isnan(result.data),
nat,
np.round(np.nan_to_num(result.data)).astype(np.int64),
)
variables[name] = result.copy(data=as_int.view(var.dtype))
var_indexers = {k: v for k, v in use_indexers.items() if k in var.dims}
int_data = var.astype(np.int64)
nat = np.iinfo(np.int64).min
as_float = computation.where(
int_data == nat, np.nan, int_data.astype(np.float64)
)
result = missing.interp(
as_float, var_indexers, method, **kwargs
)
as_int = computation.where(
result.isnull(),
nat,
result.fillna(0).round().astype(np.int64)
)
variables[name] = as_int.astype(var.dtype)

coords={"x": np.arange(5), "y": np.arange(5)},
).chunk({"x": 2, "y": 2})

result = ds.interp(x=[0.5, 1.5], y=[0.5, 1.5])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was passing with your previous implementation, but this is always good to include as a sanity check for dask tests:

Suggested change
result = ds.interp(x=[0.5, 1.5], y=[0.5, 1.5])
with raise_if_dask_computes():
result = ds.interp(x=[0.5, 1.5], y=[0.5, 1.5])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dataset.interp() silently dropping time-like data arrays

2 participants