Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
**05/17/2026:** Chunked `waterdata` calls that fail partway through are now resumable. Any sub-request failure (quota exhaustion, mid-pagination 5xx/429, transport error) raises `PartialResult` (or its `QuotaExhausted` subclass) carrying the combined partial DataFrame, a `BaseMetadata.partial_metadata` accessor, and a `ChunkManifest` recording how many sub-requests of the cartesian-product plan completed. The same getter accepts the partial metadata via a new `resume_from=` kwarg; the chunker validates the saved plan matches the fresh args and fetches only the remaining cartesian-product combinations. Callers concatenate their saved partial DataFrame with the resume call's return value to reconstruct the full result. The manifest is also attached to `BaseMetadata.chunk_manifest` on successful chunked calls for observability.

**05/17/2026:** The OGC `waterdata` getters (`get_daily`, `get_continuous`, `get_field_measurements`, and the rest of the multi-value-capable functions) now transparently chunk requests whose URLs would otherwise exceed the server's ~8 KB byte limit. A common chained-query pattern — pull a long site list from `get_monitoring_locations`, then feed it into `get_daily` — previously failed with HTTP 414 once the resulting URL grew past the limit; it now fans out across multiple sub-requests under the hood and returns one combined DataFrame. The chunker coordinates with the existing CQL `filter` chunker (long top-level-`OR` filters still split correctly when used alongside long multi-value lists), caps cartesian-product plans at 1000 sub-requests (the default USGS hourly quota), and aborts mid-call with a structured `QuotaExhausted` exception — carrying the partial result and a resume offset — if `x-ratelimit-remaining` drops below a safety floor. Mirrors R `dataRetrieval`'s [#870](https://github.com/DOI-USGS/dataRetrieval/pull/870), generalized to N dimensions. Note one metadata-behavior change for paginated/chunked calls: `BaseMetadata.url` still reflects the user's original query (unchanged), but `BaseMetadata.header` now carries the *last* page's / sub-request's headers (so `x-ratelimit-remaining` is current) rather than the first, and `BaseMetadata.query_time` is now the cumulative wall-clock across pages instead of the first page's elapsed.

**05/16/2026:** Fixed silent truncation in the paginated `waterdata` request loops (`_walk_pages` and `get_stats_data`). Mid-pagination failures (HTTP 429, 5xx, network error) were previously swallowed — pagination would quietly stop and the function would return whatever rows it had collected, leaving callers with truncated DataFrames they had no way to detect. The loops now status-check every page like the initial request and raise `RuntimeError` on any failure, with the upstream exception chained as `__cause__` and a short menu of recovery actions (wait and retry, reduce the request, or obtain an API token) in the message. **Behavior change**: callers that previously consumed partial DataFrames on transient upstream blips will now see an exception; retry the call (possibly with a smaller `limit` or narrower query).

**05/07/2026:** Bumped the declared minimum Python version from **3.8** to **3.9** (`pyproject.toml`'s `requires-python` and the ruff target). This brings the manifest in line with what was already being tested — CI's matrix has long covered only 3.9, 3.13, and 3.14, the `waterdata` test module already skipped itself on Python < 3.10, and several modules already use 3.9-only stdlib (e.g. `zoneinfo`). Users on 3.8 will no longer be able to install the package; please upgrade.
Expand Down
5 changes: 5 additions & 0 deletions dataretrieval/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,11 @@ def __init__(self, response) -> None:
self.query_time = response.elapsed
self.header = response.headers
self.comment = None
# Set by ``waterdata.chunking.multi_value_chunked`` when a request
# was split into sub-requests. ``None`` for non-chunked calls. See
# ``ChunkManifest`` for how callers use this to resume a partial
# query.
self.chunk_manifest = getattr(response, "chunk_manifest", None)

# # not sure what statistic_info is
# self.statistic_info = None
Expand Down
10 changes: 10 additions & 0 deletions dataretrieval/waterdata/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,12 @@
get_stats_por,
get_time_series_metadata,
)
from .chunking import (
ChunkManifest,
PartialResult,
QuotaExhausted,
RequestTooLarge,
)
from .filters import FILTER_LANG
from .nearest import get_nearest_continuous
from .ratings import get_ratings
Expand All @@ -45,6 +51,10 @@
"PROFILES",
"PROFILE_LOOKUP",
"SERVICES",
"ChunkManifest",
"PartialResult",
"QuotaExhausted",
"RequestTooLarge",
"get_channel",
"get_codes",
"get_combined_metadata",
Expand Down
114 changes: 114 additions & 0 deletions dataretrieval/waterdata/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ def get_daily(
filter: str | None = None,
filter_lang: FILTER_LANG | None = None,
convert_type: bool = True,
resume_from: BaseMetadata | None = None,
) -> tuple[pd.DataFrame, BaseMetadata]:
"""Daily data provide one data value to represent water conditions for the
day.
Expand Down Expand Up @@ -189,6 +190,14 @@ def get_daily(
and the lexicographic-comparison pitfall.
convert_type : boolean, optional
If True, converts columns to appropriate types.
resume_from : BaseMetadata, optional
Metadata returned alongside a ``PartialResult`` (or
``QuotaExhausted``) exception from a previous call. The chunker
consults its ``chunk_manifest`` to skip already-completed
sub-requests and fetch only the remainder. Pass the same other
kwargs as the original call. See the
:ref:`waterdata-chunking-resume` user guide for a worked
retry-loop example.

Returns
-------
Expand Down Expand Up @@ -230,6 +239,21 @@ def get_daily(
... parameter_code="00060",
... last_modified="P7D",
... )

>>> # Chain queries: pull all stream sites in a state, then their
>>> # daily discharge for the last week. The site list can be hundreds
>>> # of values long — the request is transparently chunked across
>>> # multiple sub-requests so the URL stays under the server's byte
>>> # limit. Combined output looks like a single query.
>>> sites_df, _ = dataretrieval.waterdata.get_monitoring_locations(
... state_name="Ohio",
... site_type="Stream",
... )
>>> df, md = dataretrieval.waterdata.get_daily(
... monitoring_location_id=sites_df["monitoring_location_id"].tolist(),
... parameter_code="00060",
... time="P7D",
... )
"""
service = "daily"
output_id = "daily_id"
Expand Down Expand Up @@ -257,6 +281,7 @@ def get_continuous(
filter: str | None = None,
filter_lang: FILTER_LANG | None = None,
convert_type: bool = True,
resume_from: BaseMetadata | None = None,
) -> tuple[pd.DataFrame, BaseMetadata]:
"""
Continuous data provide instantanous water conditions.
Expand Down Expand Up @@ -384,6 +409,14 @@ def get_continuous(
convert_type : boolean, optional
If True, the function will convert the data to dates and qualifier to
string vector
resume_from : BaseMetadata, optional
Metadata returned alongside a ``PartialResult`` (or
``QuotaExhausted``) exception from a previous call. The chunker
consults its ``chunk_manifest`` to skip already-completed
sub-requests and fetch only the remainder. Pass the same other
kwargs as the original call. See the
:ref:`waterdata-chunking-resume` user guide for a worked
retry-loop example.

Returns
-------
Expand Down Expand Up @@ -477,6 +510,7 @@ def get_monitoring_locations(
filter: str | None = None,
filter_lang: FILTER_LANG | None = None,
convert_type: bool = True,
resume_from: BaseMetadata | None = None,
) -> tuple[pd.DataFrame, BaseMetadata]:
"""Location information is basic information about the monitoring location
including the name, identifier, agency responsible for data collection, and
Expand Down Expand Up @@ -692,6 +726,14 @@ def get_monitoring_locations(
and the lexicographic-comparison pitfall.
convert_type : boolean, optional
If True, converts columns to appropriate types.
resume_from : BaseMetadata, optional
Metadata returned alongside a ``PartialResult`` (or
``QuotaExhausted``) exception from a previous call. The chunker
consults its ``chunk_manifest`` to skip already-completed
sub-requests and fetch only the remainder. Pass the same other
kwargs as the original call. See the
:ref:`waterdata-chunking-resume` user guide for a worked
retry-loop example.

Returns
-------
Expand Down Expand Up @@ -755,6 +797,7 @@ def get_time_series_metadata(
filter: str | None = None,
filter_lang: FILTER_LANG | None = None,
convert_type: bool = True,
resume_from: BaseMetadata | None = None,
) -> tuple[pd.DataFrame, BaseMetadata]:
"""Daily data and continuous measurements are grouped into time series,
which represent a collection of observations of a single parameter,
Expand Down Expand Up @@ -915,6 +958,14 @@ def get_time_series_metadata(
and the lexicographic-comparison pitfall.
convert_type : boolean, optional
If True, converts columns to appropriate types.
resume_from : BaseMetadata, optional
Metadata returned alongside a ``PartialResult`` (or
``QuotaExhausted``) exception from a previous call. The chunker
consults its ``chunk_manifest`` to skip already-completed
sub-requests and fetch only the remainder. Pass the same other
kwargs as the original call. See the
:ref:`waterdata-chunking-resume` user guide for a worked
retry-loop example.

Returns
-------
Expand Down Expand Up @@ -1012,6 +1063,7 @@ def get_combined_metadata(
filter: str | None = None,
filter_lang: FILTER_LANG | None = None,
convert_type: bool = True,
resume_from: BaseMetadata | None = None,
) -> tuple[pd.DataFrame, BaseMetadata]:
"""Get combined monitoring-location and time-series metadata.

Expand Down Expand Up @@ -1112,6 +1164,14 @@ def get_combined_metadata(
and the lexicographic-comparison pitfall.
convert_type : boolean, optional
If True, converts columns to appropriate types.
resume_from : BaseMetadata, optional
Metadata returned alongside a ``PartialResult`` (or
``QuotaExhausted``) exception from a previous call. The chunker
consults its ``chunk_manifest`` to skip already-completed
sub-requests and fetch only the remainder. Pass the same other
kwargs as the original call. See the
:ref:`waterdata-chunking-resume` user guide for a worked
retry-loop example.

Returns
-------
Expand Down Expand Up @@ -1200,6 +1260,7 @@ def get_latest_continuous(
filter: str | None = None,
filter_lang: FILTER_LANG | None = None,
convert_type: bool = True,
resume_from: BaseMetadata | None = None,
) -> tuple[pd.DataFrame, BaseMetadata]:
"""This endpoint provides the most recent observation for each time series
of continuous data. Continuous data are collected via automated sensors
Expand Down Expand Up @@ -1329,6 +1390,14 @@ def get_latest_continuous(
and the lexicographic-comparison pitfall.
convert_type : boolean, optional
If True, converts columns to appropriate types.
resume_from : BaseMetadata, optional
Metadata returned alongside a ``PartialResult`` (or
``QuotaExhausted``) exception from a previous call. The chunker
consults its ``chunk_manifest`` to skip already-completed
sub-requests and fetch only the remainder. Pass the same other
kwargs as the original call. See the
:ref:`waterdata-chunking-resume` user guide for a worked
retry-loop example.

Returns
-------
Expand Down Expand Up @@ -1395,6 +1464,7 @@ def get_latest_daily(
filter: str | None = None,
filter_lang: FILTER_LANG | None = None,
convert_type: bool = True,
resume_from: BaseMetadata | None = None,
) -> tuple[pd.DataFrame, BaseMetadata]:
"""Daily data provide one data value to represent water conditions for the
day.
Expand Down Expand Up @@ -1526,6 +1596,14 @@ def get_latest_daily(
and the lexicographic-comparison pitfall.
convert_type : boolean, optional
If True, converts columns to appropriate types.
resume_from : BaseMetadata, optional
Metadata returned alongside a ``PartialResult`` (or
``QuotaExhausted``) exception from a previous call. The chunker
consults its ``chunk_manifest`` to skip already-completed
sub-requests and fetch only the remainder. Pass the same other
kwargs as the original call. See the
:ref:`waterdata-chunking-resume` user guide for a worked
retry-loop example.

Returns
-------
Expand Down Expand Up @@ -1593,6 +1671,7 @@ def get_field_measurements(
filter: str | None = None,
filter_lang: FILTER_LANG | None = None,
convert_type: bool = True,
resume_from: BaseMetadata | None = None,
) -> tuple[pd.DataFrame, BaseMetadata]:
"""Field measurements are physically measured values collected during a
visit to the monitoring location. Field measurements consist of measurements
Expand Down Expand Up @@ -1714,6 +1793,14 @@ def get_field_measurements(
and the lexicographic-comparison pitfall.
convert_type : boolean, optional
If True, converts columns to appropriate types.
resume_from : BaseMetadata, optional
Metadata returned alongside a ``PartialResult`` (or
``QuotaExhausted``) exception from a previous call. The chunker
consults its ``chunk_manifest`` to skip already-completed
sub-requests and fetch only the remainder. Pass the same other
kwargs as the original call. See the
:ref:`waterdata-chunking-resume` user guide for a worked
retry-loop example.

Returns
-------
Expand Down Expand Up @@ -1777,6 +1864,7 @@ def get_field_measurements_metadata(
filter: str | None = None,
filter_lang: FILTER_LANG | None = None,
convert_type: bool = True,
resume_from: BaseMetadata | None = None,
) -> tuple[pd.DataFrame, BaseMetadata]:
"""Get field-measurement metadata: one row per (location, parameter) series.

Expand Down Expand Up @@ -1832,6 +1920,14 @@ def get_field_measurements_metadata(
and the lexicographic-comparison pitfall.
convert_type : boolean, optional
If True, converts columns to appropriate types.
resume_from : BaseMetadata, optional
Metadata returned alongside a ``PartialResult`` (or
``QuotaExhausted``) exception from a previous call. The chunker
consults its ``chunk_manifest`` to skip already-completed
sub-requests and fetch only the remainder. Pass the same other
kwargs as the original call. See the
:ref:`waterdata-chunking-resume` user guide for a worked
retry-loop example.

Returns
-------
Expand Down Expand Up @@ -1898,6 +1994,7 @@ def get_peaks(
filter: str | None = None,
filter_lang: FILTER_LANG | None = None,
convert_type: bool = True,
resume_from: BaseMetadata | None = None,
) -> tuple[pd.DataFrame, BaseMetadata]:
"""Get the annual peak streamflow / stage record for a monitoring location.

Expand Down Expand Up @@ -1956,6 +2053,14 @@ def get_peaks(
and the lexicographic-comparison pitfall.
convert_type : boolean, optional
If True, converts columns to appropriate types.
resume_from : BaseMetadata, optional
Metadata returned alongside a ``PartialResult`` (or
``QuotaExhausted``) exception from a previous call. The chunker
consults its ``chunk_manifest`` to skip already-completed
sub-requests and fetch only the remainder. Pass the same other
kwargs as the original call. See the
:ref:`waterdata-chunking-resume` user guide for a worked
retry-loop example.

Returns
-------
Expand Down Expand Up @@ -2695,6 +2800,7 @@ def get_channel(
filter: str | None = None,
filter_lang: FILTER_LANG | None = None,
convert_type: bool = True,
resume_from: BaseMetadata | None = None,
) -> tuple[pd.DataFrame, BaseMetadata]:
"""
Channel measurements taken as part of streamflow field measurements.
Expand Down Expand Up @@ -2808,6 +2914,14 @@ def get_channel(
convert_type : boolean, optional
If True, the function will convert the data to dates and qualifier to
string vector
resume_from : BaseMetadata, optional
Metadata returned alongside a ``PartialResult`` (or
``QuotaExhausted``) exception from a previous call. The chunker
consults its ``chunk_manifest`` to skip already-completed
sub-requests and fetch only the remainder. Pass the same other
kwargs as the original call. See the
:ref:`waterdata-chunking-resume` user guide for a worked
retry-loop example.

Returns
-------
Expand Down
Loading
Loading