fix: support datetime variables in Dataset.interp#11081
fix: support datetime variables in Dataset.interp#11081emmanuel-ferdman wants to merge 3 commits intopydata:mainfrom
Dataset.interp#11081Conversation
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
29d76d7 to
e4bf80a
Compare
spencerkclark
left a comment
There was a problem hiding this comment.
Thanks @emmanuel-ferdman—this largely looks good.
Could you add "Mm" to the valid dtypes on the DataArray.interp side as well? There is no need now to raise in that circumstance, which is great.
One edge case to consider is what to do in the scenario that extrapolation leads to float values outside the range that can be represented by 64-bit integers. Maybe we punt on that for now though, since it could be messy to handle in a robust way.
| var_indexers = {k: v for k, v in use_indexers.items() if k in var.dims} | ||
| int_data = var.data.view(np.int64) | ||
| nat = np.iinfo(np.int64).min | ||
| as_float = np.where( | ||
| int_data == nat, np.nan, int_data.astype(np.float64) | ||
| ) | ||
| result = missing.interp( | ||
| var.copy(data=as_float), var_indexers, method, **kwargs | ||
| ) | ||
| as_int = np.where( | ||
| np.isnan(result.data), | ||
| nat, | ||
| np.round(np.nan_to_num(result.data)).astype(np.int64), | ||
| ) | ||
| variables[name] = result.copy(data=as_int.view(var.dtype)) |
There was a problem hiding this comment.
I believe the following, by sticking completely to functions / methods defined for xarray objects, should be robust to all chunked array implementations, not just dask (computation comes from from xarray.computation import computation):
| var_indexers = {k: v for k, v in use_indexers.items() if k in var.dims} | |
| int_data = var.data.view(np.int64) | |
| nat = np.iinfo(np.int64).min | |
| as_float = np.where( | |
| int_data == nat, np.nan, int_data.astype(np.float64) | |
| ) | |
| result = missing.interp( | |
| var.copy(data=as_float), var_indexers, method, **kwargs | |
| ) | |
| as_int = np.where( | |
| np.isnan(result.data), | |
| nat, | |
| np.round(np.nan_to_num(result.data)).astype(np.int64), | |
| ) | |
| variables[name] = result.copy(data=as_int.view(var.dtype)) | |
| var_indexers = {k: v for k, v in use_indexers.items() if k in var.dims} | |
| int_data = var.astype(np.int64) | |
| nat = np.iinfo(np.int64).min | |
| as_float = computation.where( | |
| int_data == nat, np.nan, int_data.astype(np.float64) | |
| ) | |
| result = missing.interp( | |
| as_float, var_indexers, method, **kwargs | |
| ) | |
| as_int = computation.where( | |
| result.isnull(), | |
| nat, | |
| result.fillna(0).round().astype(np.int64) | |
| ) | |
| variables[name] = as_int.astype(var.dtype) |
| coords={"x": np.arange(5), "y": np.arange(5)}, | ||
| ).chunk({"x": 2, "y": 2}) | ||
|
|
||
| result = ds.interp(x=[0.5, 1.5], y=[0.5, 1.5]) |
There was a problem hiding this comment.
It was passing with your previous implementation, but this is always good to include as a sanity check for dask tests:
| result = ds.interp(x=[0.5, 1.5], y=[0.5, 1.5]) | |
| with raise_if_dask_computes(): | |
| result = ds.interp(x=[0.5, 1.5], y=[0.5, 1.5]) |
PR Summary
Dataset.interp()silently droppeddatetime64andtimedelta64variables. Now they are interpolated by converting tofloat64and back, withNaThandled likeNaN.Dataset.interp()silently dropping time-like data arrays #10900