-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[pydap backend] enables downloading multiple dim arrays within single http request #10629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Mikejmnez
wants to merge
28
commits into
pydata:main
Choose a base branch
from
Mikejmnez:pydap4_scale
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+130
−23
Open
Changes from all commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
ceff231
update PydapArrayWrapper to support backend batching
Mikejmnez ec1282b
rebase
Mikejmnez e946e2a
pydap-server it not necessary
Mikejmnez 23d2ea7
set `batch=False` as default
Mikejmnez af27f3b
set `batch=False` as default in datatree
Mikejmnez 3ed7421
set `batch=False` as default in open groups as dict
Mikejmnez 22e2c60
for flaky, install pydap from repo for now
Mikejmnez 85e51da
initial tests - quantify cached url
Mikejmnez 1d0954a
adds tests to datatree backend to assert multiple dimensions download…
Mikejmnez 4a02f3e
update testing to show number of download urls
Mikejmnez 1091c4f
simplified logic
Mikejmnez 0467347
specify cached session debug name to actually cache urls
Mikejmnez 3a227ab
fix for mypy
Mikejmnez 3278938
user visible changes on `whats-new.rst`
Mikejmnez 9abf10a
impose sorted to `get_dimensions` method
Mikejmnez dd1cd3c
reformat `whats-new.rst`
Mikejmnez ff20e7e
revert to install pydap from conda and not from repo
Mikejmnez 123dd82
expose checksum as user kwarg
Mikejmnez b24800d
include `checksums` optional argument in `whats-new`
Mikejmnez 6b8c146
update to newest release of pydap via pip until conda install is avai…
Mikejmnez 8629e61
use requests_cache session with retry-params when 500 errors occur
Mikejmnez 14a9aa6
update env yml file to use new pydap release via conda
Mikejmnez 23e0376
turn on testing on datatree from test.opendap.org
Mikejmnez 58d9d50
rebase with main
Mikejmnez 76dd268
update what`s new
Mikejmnez 1be221c
removes batch as arg - acts always but only on dimension data arrays
Mikejmnez cc631f1
updates tests
Mikejmnez 8a793bf
update `whats new`
Mikejmnez File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -36,7 +36,7 @@ | |||||
|
|
||||||
|
|
||||||
| class PydapArrayWrapper(BackendArray): | ||||||
| def __init__(self, array): | ||||||
| def __init__(self, array, checksums=True): | ||||||
| self.array = array | ||||||
|
|
||||||
| @property | ||||||
|
|
@@ -54,12 +54,10 @@ def __getitem__(self, key): | |||||
|
|
||||||
| def _getitem(self, key): | ||||||
| result = robust_getitem(self.array, key, catch=ValueError) | ||||||
| # in some cases, pydap doesn't squeeze axes automatically like numpy | ||||||
| result = np.asarray(result) | ||||||
| result = np.asarray(result.data) | ||||||
| axis = tuple(n for n, k in enumerate(key) if isinstance(k, integer_types)) | ||||||
| if result.ndim + len(axis) != self.array.ndim and axis: | ||||||
| result = np.squeeze(result, axis) | ||||||
|
|
||||||
| return result | ||||||
|
|
||||||
|
|
||||||
|
|
@@ -82,7 +80,14 @@ class PydapDataStore(AbstractDataStore): | |||||
| be useful if the netCDF4 library is not available. | ||||||
| """ | ||||||
|
|
||||||
| def __init__(self, dataset, group=None): | ||||||
| def __init__( | ||||||
| self, | ||||||
| dataset, | ||||||
| group=None, | ||||||
| session=None, | ||||||
| protocol=None, | ||||||
| checksums=True, | ||||||
| ): | ||||||
| """ | ||||||
| Parameters | ||||||
| ---------- | ||||||
|
|
@@ -92,6 +97,8 @@ def __init__(self, dataset, group=None): | |||||
| """ | ||||||
| self.dataset = dataset | ||||||
| self.group = group | ||||||
| self._protocol = protocol | ||||||
| self._checksums = checksums # true by default | ||||||
|
|
||||||
| @classmethod | ||||||
| def open( | ||||||
|
|
@@ -104,6 +111,7 @@ def open( | |||||
| timeout=None, | ||||||
| verify=None, | ||||||
| user_charset=None, | ||||||
| checksums=True, | ||||||
| ): | ||||||
| from pydap.client import open_url | ||||||
| from pydap.net import DEFAULT_TIMEOUT | ||||||
|
|
@@ -118,6 +126,7 @@ def open( | |||||
| DeprecationWarning, | ||||||
| ) | ||||||
| output_grid = False # new default behavior | ||||||
|
|
||||||
| kwargs = { | ||||||
| "url": url, | ||||||
| "application": application, | ||||||
|
|
@@ -133,22 +142,37 @@ def open( | |||||
| elif hasattr(url, "ds"): | ||||||
| # pydap dataset | ||||||
| dataset = url.ds | ||||||
| args = {"dataset": dataset} | ||||||
| args = {"dataset": dataset, "checksums": checksums} | ||||||
| if group: | ||||||
| # only then, change the default | ||||||
| args["group"] = group | ||||||
| if url.startswith(("http", "dap2")): | ||||||
| args["protocol"] = "dap2" | ||||||
| elif url.startswith("dap4"): | ||||||
| args["protocol"] = "dap4" | ||||||
| return cls(**args) | ||||||
|
|
||||||
| def open_store_variable(self, var): | ||||||
| data = indexing.LazilyIndexedArray(PydapArrayWrapper(var)) | ||||||
| try: | ||||||
| if hasattr(var, "dims"): | ||||||
| dimensions = [ | ||||||
| dim.split("/")[-1] if dim.startswith("/") else dim for dim in var.dims | ||||||
| ] | ||||||
| except AttributeError: | ||||||
| else: | ||||||
| # GridType does not have a dims attribute - instead get `dimensions` | ||||||
| # see https://github.com/pydap/pydap/issues/485 | ||||||
| dimensions = var.dimensions | ||||||
| if ( | ||||||
| self._protocol == "dap4" | ||||||
| and var.name in dimensions | ||||||
| and hasattr(var, "dataset") # only True for pydap>3.5.5 | ||||||
| ): | ||||||
| var.dataset.enable_batch_mode() | ||||||
| data_array = self._get_data_array(var) | ||||||
| data = indexing.LazilyIndexedArray(data_array) | ||||||
| var.dataset.disable_batch_mode() | ||||||
| else: | ||||||
| # all non-dimension variables | ||||||
| data = indexing.LazilyIndexedArray(PydapArrayWrapper(var)) | ||||||
|
|
||||||
| return Variable(dimensions, data, var.attributes) | ||||||
|
|
||||||
| def get_variables(self): | ||||||
|
|
@@ -166,6 +190,7 @@ def get_variables(self): | |||||
| # check the key is not a BaseType or GridType | ||||||
| if not isinstance(self.ds[var], GroupType) | ||||||
| ] | ||||||
|
|
||||||
| return FrozenDict((k, self.open_store_variable(self.ds[k])) for k in _vars) | ||||||
|
|
||||||
| def get_attrs(self): | ||||||
|
|
@@ -177,18 +202,33 @@ def get_attrs(self): | |||||
| "libdap", | ||||||
| "invocation", | ||||||
| "dimensions", | ||||||
| "path", | ||||||
| "Maps", | ||||||
| ) | ||||||
| attrs = self.ds.attributes | ||||||
| list(map(attrs.pop, opendap_attrs, [None] * 6)) | ||||||
| attrs = dict(self.ds.attributes) | ||||||
| list(map(attrs.pop, opendap_attrs, [None] * 8)) | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| return Frozen(attrs) | ||||||
|
|
||||||
| def get_dimensions(self): | ||||||
| return Frozen(self.ds.dimensions) | ||||||
| return Frozen(sorted(self.ds.dimensions)) | ||||||
Mikejmnez marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
|
||||||
| @property | ||||||
| def ds(self): | ||||||
| return get_group(self.dataset, self.group) | ||||||
|
|
||||||
| def _get_data_array(self, var): | ||||||
| """gets dimension data all at once, storing the numpy | ||||||
| arrays within a cached dictionary | ||||||
| """ | ||||||
| from pydap.client import get_batch_data | ||||||
|
|
||||||
| if not var._is_data_loaded(): | ||||||
| # data has not been deserialized yet | ||||||
| # runs only once per store/hierarchy | ||||||
| get_batch_data(var, checksums=self._checksums) | ||||||
|
|
||||||
| return self.dataset[var.id].data | ||||||
|
|
||||||
|
|
||||||
| class PydapBackendEntrypoint(BackendEntrypoint): | ||||||
| """ | ||||||
|
|
@@ -250,6 +290,7 @@ def open_dataset( | |||||
| timeout=None, | ||||||
| verify=None, | ||||||
| user_charset=None, | ||||||
| checksums=True, | ||||||
| ) -> Dataset: | ||||||
| store = PydapDataStore.open( | ||||||
| url=filename_or_obj, | ||||||
|
|
@@ -260,6 +301,7 @@ def open_dataset( | |||||
| timeout=timeout, | ||||||
| verify=verify, | ||||||
| user_charset=user_charset, | ||||||
| checksums=checksums, | ||||||
| ) | ||||||
| store_entrypoint = StoreBackendEntrypoint() | ||||||
| with close_on_error(store): | ||||||
|
|
@@ -292,6 +334,7 @@ def open_datatree( | |||||
| timeout=None, | ||||||
| verify=None, | ||||||
| user_charset=None, | ||||||
| checksums=True, | ||||||
| ) -> DataTree: | ||||||
| groups_dict = self.open_groups_as_dict( | ||||||
| filename_or_obj, | ||||||
|
|
@@ -304,10 +347,11 @@ def open_datatree( | |||||
| decode_timedelta=decode_timedelta, | ||||||
| group=group, | ||||||
| application=None, | ||||||
| session=None, | ||||||
| timeout=None, | ||||||
| verify=None, | ||||||
| user_charset=None, | ||||||
| session=session, | ||||||
| timeout=timeout, | ||||||
| verify=application, | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why does this function take both |
||||||
| user_charset=user_charset, | ||||||
| checksums=checksums, | ||||||
| ) | ||||||
|
|
||||||
| return datatree_from_dict_with_io_cleanup(groups_dict) | ||||||
|
|
@@ -329,6 +373,7 @@ def open_groups_as_dict( | |||||
| timeout=None, | ||||||
| verify=None, | ||||||
| user_charset=None, | ||||||
| checksums=True, | ||||||
| ) -> dict[str, Dataset]: | ||||||
| from xarray.core.treenode import NodePath | ||||||
|
|
||||||
|
|
@@ -340,6 +385,7 @@ def open_groups_as_dict( | |||||
| timeout=timeout, | ||||||
| verify=verify, | ||||||
| user_charset=user_charset, | ||||||
| checksums=checksums, | ||||||
| ) | ||||||
|
|
||||||
| # Check for a group and make it a parent if it exists | ||||||
|
|
||||||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as an aside a
with var.dataset.batch_mode():context manager would be nice API for this