DOI-USGS · thodson-usgs · May 17, 2026 · May 17, 2026 · May 17, 2026 · May 17, 2026
diff --git a/NEWS.md b/NEWS.md
@@ -1,3 +1,7 @@
+**05/17/2026:** Chunked `waterdata` calls that fail partway through are now resumable. Any sub-request failure (quota exhaustion, mid-pagination 5xx/429, transport error) raises `PartialResult` (or its `QuotaExhausted` subclass) carrying the combined partial DataFrame, a `BaseMetadata.partial_metadata` accessor, and a `ChunkManifest` recording how many sub-requests of the cartesian-product plan completed. The same getter accepts the partial metadata via a new `resume_from=` kwarg; the chunker validates the saved plan matches the fresh args and fetches only the remaining cartesian-product combinations. Callers concatenate their saved partial DataFrame with the resume call's return value to reconstruct the full result. The manifest is also attached to `BaseMetadata.chunk_manifest` on successful chunked calls for observability.
+
+**05/17/2026:** The OGC `waterdata` getters (`get_daily`, `get_continuous`, `get_field_measurements`, and the rest of the multi-value-capable functions) now transparently chunk requests whose URLs would otherwise exceed the server's ~8 KB byte limit. A common chained-query pattern — pull a long site list from `get_monitoring_locations`, then feed it into `get_daily` — previously failed with HTTP 414 once the resulting URL grew past the limit; it now fans out across multiple sub-requests under the hood and returns one combined DataFrame. The chunker coordinates with the existing CQL `filter` chunker (long top-level-`OR` filters still split correctly when used alongside long multi-value lists), caps cartesian-product plans at 1000 sub-requests (the default USGS hourly quota), and aborts mid-call with a structured `QuotaExhausted` exception — carrying the partial result and a resume offset — if `x-ratelimit-remaining` drops below a safety floor. Mirrors R `dataRetrieval`'s [#870](https://github.com/DOI-USGS/dataRetrieval/pull/870), generalized to N dimensions. Note one metadata-behavior change for paginated/chunked calls: `BaseMetadata.url` still reflects the user's original query (unchanged), but `BaseMetadata.header` now carries the *last* page's / sub-request's headers (so `x-ratelimit-remaining` is current) rather than the first, and `BaseMetadata.query_time` is now the cumulative wall-clock across pages instead of the first page's elapsed.
+
 **05/16/2026:** Fixed silent truncation in the paginated `waterdata` request loops (`_walk_pages` and `get_stats_data`). Mid-pagination failures (HTTP 429, 5xx, network error) were previously swallowed — pagination would quietly stop and the function would return whatever rows it had collected, leaving callers with truncated DataFrames they had no way to detect. The loops now status-check every page like the initial request and raise `RuntimeError` on any failure, with the upstream exception chained as `__cause__` and a short menu of recovery actions (wait and retry, reduce the request, or obtain an API token) in the message. **Behavior change**: callers that previously consumed partial DataFrames on transient upstream blips will now see an exception; retry the call (possibly with a smaller `limit` or narrower query).
 
 **05/07/2026:** Bumped the declared minimum Python version from **3.8** to **3.9** (`pyproject.toml`'s `requires-python` and the ruff target). This brings the manifest in line with what was already being tested — CI's matrix has long covered only 3.9, 3.13, and 3.14, the `waterdata` test module already skipped itself on Python < 3.10, and several modules already use 3.9-only stdlib (e.g. `zoneinfo`). Users on 3.8 will no longer be able to install the package; please upgrade.

diff --git a/dataretrieval/utils.py b/dataretrieval/utils.py
@@ -230,6 +230,11 @@ def __init__(self, response) -> None:
         self.query_time = response.elapsed
         self.header = response.headers
         self.comment = None
+        # Set by ``waterdata.chunking.multi_value_chunked`` when a request
+        # was split into sub-requests. ``None`` for non-chunked calls. See
+        # ``ChunkManifest`` for how callers use this to resume a partial
+        # query.
+        self.chunk_manifest = getattr(response, "chunk_manifest", None)
 
         # # not sure what statistic_info is
         # self.statistic_info = None

diff --git a/dataretrieval/waterdata/__init__.py b/dataretrieval/waterdata/__init__.py
@@ -29,6 +29,12 @@
     get_stats_por,
     get_time_series_metadata,
 )
+from .chunking import (
+    ChunkManifest,
+    PartialResult,
+    QuotaExhausted,
+    RequestTooLarge,
+)
 from .filters import FILTER_LANG
 from .nearest import get_nearest_continuous
 from .ratings import get_ratings
@@ -45,6 +51,10 @@
     "PROFILES",
     "PROFILE_LOOKUP",
     "SERVICES",
+    "ChunkManifest",
+    "PartialResult",
+    "QuotaExhausted",
+    "RequestTooLarge",
     "get_channel",
     "get_codes",
     "get_combined_metadata",

diff --git a/dataretrieval/waterdata/api.py b/dataretrieval/waterdata/api.py
@@ -57,6 +57,7 @@ def get_daily(
     filter: str | None = None,
     filter_lang: FILTER_LANG | None = None,
     convert_type: bool = True,
+    resume_from: BaseMetadata | None = None,
 ) -> tuple[pd.DataFrame, BaseMetadata]:
     """Daily data provide one data value to represent water conditions for the
     day.
@@ -189,6 +190,14 @@ def get_daily(
         and the lexicographic-comparison pitfall.
     convert_type : boolean, optional
         If True, converts columns to appropriate types.
+    resume_from : BaseMetadata, optional
+        Metadata returned alongside a ``PartialResult`` (or
+        ``QuotaExhausted``) exception from a previous call. The chunker
+        consults its ``chunk_manifest`` to skip already-completed
+        sub-requests and fetch only the remainder. Pass the same other
+        kwargs as the original call. See the
+        :ref:`waterdata-chunking-resume` user guide for a worked
+        retry-loop example.
 
     Returns
     -------
@@ -230,6 +239,21 @@ def get_daily(
         ...     parameter_code="00060",
         ...     last_modified="P7D",
         ... )
+
+        >>> # Chain queries: pull all stream sites in a state, then their
+        >>> # daily discharge for the last week. The site list can be hundreds
+        >>> # of values long — the request is transparently chunked across
+        >>> # multiple sub-requests so the URL stays under the server's byte
+        >>> # limit. Combined output looks like a single query.
+        >>> sites_df, _ = dataretrieval.waterdata.get_monitoring_locations(
+        ...     state_name="Ohio",
+        ...     site_type="Stream",
+        ... )
+        >>> df, md = dataretrieval.waterdata.get_daily(
+        ...     monitoring_location_id=sites_df["monitoring_location_id"].tolist(),
+        ...     parameter_code="00060",
+        ...     time="P7D",
+        ... )
     """
     service = "daily"
     output_id = "daily_id"
@@ -257,6 +281,7 @@ def get_continuous(
     filter: str | None = None,
     filter_lang: FILTER_LANG | None = None,
     convert_type: bool = True,
+    resume_from: BaseMetadata | None = None,
 ) -> tuple[pd.DataFrame, BaseMetadata]:
     """
     Continuous data provide instantanous water conditions.
@@ -384,6 +409,14 @@ def get_continuous(
     convert_type : boolean, optional
         If True, the function will convert the data to dates and qualifier to
         string vector
+    resume_from : BaseMetadata, optional
+        Metadata returned alongside a ``PartialResult`` (or
+        ``QuotaExhausted``) exception from a previous call. The chunker
+        consults its ``chunk_manifest`` to skip already-completed
+        sub-requests and fetch only the remainder. Pass the same other
+        kwargs as the original call. See the
+        :ref:`waterdata-chunking-resume` user guide for a worked
+        retry-loop example.
 
     Returns
     -------
@@ -477,6 +510,7 @@ def get_monitoring_locations(
     filter: str | None = None,
     filter_lang: FILTER_LANG | None = None,
     convert_type: bool = True,
+    resume_from: BaseMetadata | None = None,
 ) -> tuple[pd.DataFrame, BaseMetadata]:
     """Location information is basic information about the monitoring location
     including the name, identifier, agency responsible for data collection, and
@@ -692,6 +726,14 @@ def get_monitoring_locations(
         and the lexicographic-comparison pitfall.
     convert_type : boolean, optional
         If True, converts columns to appropriate types.
+    resume_from : BaseMetadata, optional
+        Metadata returned alongside a ``PartialResult`` (or
+        ``QuotaExhausted``) exception from a previous call. The chunker
+        consults its ``chunk_manifest`` to skip already-completed
+        sub-requests and fetch only the remainder. Pass the same other
+        kwargs as the original call. See the
+        :ref:`waterdata-chunking-resume` user guide for a worked
+        retry-loop example.
 
     Returns
     -------
@@ -755,6 +797,7 @@ def get_time_series_metadata(
     filter: str | None = None,
     filter_lang: FILTER_LANG | None = None,
     convert_type: bool = True,
+    resume_from: BaseMetadata | None = None,
 ) -> tuple[pd.DataFrame, BaseMetadata]:
     """Daily data and continuous measurements are grouped into time series,
     which represent a collection of observations of a single parameter,
@@ -915,6 +958,14 @@ def get_time_series_metadata(
         and the lexicographic-comparison pitfall.
     convert_type : boolean, optional
         If True, converts columns to appropriate types.
+    resume_from : BaseMetadata, optional
+        Metadata returned alongside a ``PartialResult`` (or
+        ``QuotaExhausted``) exception from a previous call. The chunker
+        consults its ``chunk_manifest`` to skip already-completed
+        sub-requests and fetch only the remainder. Pass the same other
+        kwargs as the original call. See the
+        :ref:`waterdata-chunking-resume` user guide for a worked
+        retry-loop example.
 
     Returns
     -------
@@ -1012,6 +1063,7 @@ def get_combined_metadata(
     filter: str | None = None,
     filter_lang: FILTER_LANG | None = None,
     convert_type: bool = True,
+    resume_from: BaseMetadata | None = None,
 ) -> tuple[pd.DataFrame, BaseMetadata]:
     """Get combined monitoring-location and time-series metadata.
 
@@ -1112,6 +1164,14 @@ def get_combined_metadata(
         and the lexicographic-comparison pitfall.
     convert_type : boolean, optional
         If True, converts columns to appropriate types.
+    resume_from : BaseMetadata, optional
+        Metadata returned alongside a ``PartialResult`` (or
+        ``QuotaExhausted``) exception from a previous call. The chunker
+        consults its ``chunk_manifest`` to skip already-completed
+        sub-requests and fetch only the remainder. Pass the same other
+        kwargs as the original call. See the
+        :ref:`waterdata-chunking-resume` user guide for a worked
+        retry-loop example.
 
     Returns
     -------
@@ -1200,6 +1260,7 @@ def get_latest_continuous(
     filter: str | None = None,
     filter_lang: FILTER_LANG | None = None,
     convert_type: bool = True,
+    resume_from: BaseMetadata | None = None,
 ) -> tuple[pd.DataFrame, BaseMetadata]:
     """This endpoint provides the most recent observation for each time series
     of continuous data. Continuous data are collected via automated sensors
@@ -1329,6 +1390,14 @@ def get_latest_continuous(
         and the lexicographic-comparison pitfall.
     convert_type : boolean, optional
         If True, converts columns to appropriate types.
+    resume_from : BaseMetadata, optional
+        Metadata returned alongside a ``PartialResult`` (or
+        ``QuotaExhausted``) exception from a previous call. The chunker
+        consults its ``chunk_manifest`` to skip already-completed
+        sub-requests and fetch only the remainder. Pass the same other
+        kwargs as the original call. See the
+        :ref:`waterdata-chunking-resume` user guide for a worked
+        retry-loop example.
 
     Returns
     -------
@@ -1395,6 +1464,7 @@ def get_latest_daily(
     filter: str | None = None,
     filter_lang: FILTER_LANG | None = None,
     convert_type: bool = True,
+    resume_from: BaseMetadata | None = None,
 ) -> tuple[pd.DataFrame, BaseMetadata]:
     """Daily data provide one data value to represent water conditions for the
     day.
@@ -1526,6 +1596,14 @@ def get_latest_daily(
         and the lexicographic-comparison pitfall.
     convert_type : boolean, optional
         If True, converts columns to appropriate types.
+    resume_from : BaseMetadata, optional
+        Metadata returned alongside a ``PartialResult`` (or
+        ``QuotaExhausted``) exception from a previous call. The chunker
+        consults its ``chunk_manifest`` to skip already-completed
+        sub-requests and fetch only the remainder. Pass the same other
+        kwargs as the original call. See the
+        :ref:`waterdata-chunking-resume` user guide for a worked
+        retry-loop example.
 
     Returns
     -------
@@ -1593,6 +1671,7 @@ def get_field_measurements(
     filter: str | None = None,
     filter_lang: FILTER_LANG | None = None,
     convert_type: bool = True,
+    resume_from: BaseMetadata | None = None,
 ) -> tuple[pd.DataFrame, BaseMetadata]:
     """Field measurements are physically measured values collected during a
     visit to the monitoring location. Field measurements consist of measurements
@@ -1714,6 +1793,14 @@ def get_field_measurements(
         and the lexicographic-comparison pitfall.
     convert_type : boolean, optional
         If True, converts columns to appropriate types.
+    resume_from : BaseMetadata, optional
+        Metadata returned alongside a ``PartialResult`` (or
+        ``QuotaExhausted``) exception from a previous call. The chunker
+        consults its ``chunk_manifest`` to skip already-completed
+        sub-requests and fetch only the remainder. Pass the same other
+        kwargs as the original call. See the
+        :ref:`waterdata-chunking-resume` user guide for a worked
+        retry-loop example.
 
     Returns
     -------
@@ -1777,6 +1864,7 @@ def get_field_measurements_metadata(
     filter: str | None = None,
     filter_lang: FILTER_LANG | None = None,
     convert_type: bool = True,
+    resume_from: BaseMetadata | None = None,
 ) -> tuple[pd.DataFrame, BaseMetadata]:
     """Get field-measurement metadata: one row per (location, parameter) series.
 
@@ -1832,6 +1920,14 @@ def get_field_measurements_metadata(
         and the lexicographic-comparison pitfall.
     convert_type : boolean, optional
         If True, converts columns to appropriate types.
+    resume_from : BaseMetadata, optional
+        Metadata returned alongside a ``PartialResult`` (or
+        ``QuotaExhausted``) exception from a previous call. The chunker
+        consults its ``chunk_manifest`` to skip already-completed
+        sub-requests and fetch only the remainder. Pass the same other
+        kwargs as the original call. See the
+        :ref:`waterdata-chunking-resume` user guide for a worked
+        retry-loop example.
 
     Returns
     -------
@@ -1898,6 +1994,7 @@ def get_peaks(
     filter: str | None = None,
     filter_lang: FILTER_LANG | None = None,
     convert_type: bool = True,
+    resume_from: BaseMetadata | None = None,
 ) -> tuple[pd.DataFrame, BaseMetadata]:
     """Get the annual peak streamflow / stage record for a monitoring location.
 
@@ -1956,6 +2053,14 @@ def get_peaks(
         and the lexicographic-comparison pitfall.
     convert_type : boolean, optional
         If True, converts columns to appropriate types.
+    resume_from : BaseMetadata, optional
+        Metadata returned alongside a ``PartialResult`` (or
+        ``QuotaExhausted``) exception from a previous call. The chunker
+        consults its ``chunk_manifest`` to skip already-completed
+        sub-requests and fetch only the remainder. Pass the same other
+        kwargs as the original call. See the
+        :ref:`waterdata-chunking-resume` user guide for a worked
+        retry-loop example.
 
     Returns
     -------
@@ -2695,6 +2800,7 @@ def get_channel(
     filter: str | None = None,
     filter_lang: FILTER_LANG | None = None,
     convert_type: bool = True,
+    resume_from: BaseMetadata | None = None,
 ) -> tuple[pd.DataFrame, BaseMetadata]:
     """
     Channel measurements taken as part of streamflow field measurements.
@@ -2808,6 +2914,14 @@ def get_channel(
     convert_type : boolean, optional
         If True, the function will convert the data to dates and qualifier to
         string vector
+    resume_from : BaseMetadata, optional
+        Metadata returned alongside a ``PartialResult`` (or
+        ``QuotaExhausted``) exception from a previous call. The chunker
+        consults its ``chunk_manifest`` to skip already-completed
+        sub-requests and fetch only the remainder. Pass the same other
+        kwargs as the original call. See the
+        :ref:`waterdata-chunking-resume` user guide for a worked
+        retry-loop example.
 
     Returns
     -------