Skip to content

feat(waterdata): Add resume support for partial chunked queries#282

Draft
thodson-usgs wants to merge 6 commits into
DOI-USGS:mainfrom
thodson-usgs:chunker-resume
Draft

feat(waterdata): Add resume support for partial chunked queries#282
thodson-usgs wants to merge 6 commits into
DOI-USGS:mainfrom
thodson-usgs:chunker-resume

Conversation

@thodson-usgs
Copy link
Copy Markdown
Collaborator

Stacked on top of #280. This PR's diff includes #280's chunker commit; rebase / squash after #280 merges.

When a chunked OGC waterdata call fails partway through (quota exhaustion, mid-pagination 5xx/429, transport error), the chunker now raises PartialResult (or its QuotaExhausted subclass) carrying the combined partial DataFrame plus a ChunkManifest that records how many sub-requests of the cartesian-product plan completed. The same getter accepts the partial metadata back via a new resume_from= kwarg; the chunker validates the saved plan matches the fresh args and fetches only the remaining cartesian-product combinations. Callers concatenate their saved partial frame with the resume call's return value to reconstruct the full result.

Caller flow

try:
    df, md = get_daily(monitoring_location_id=sites, parameter_code="00060", time="P7D")
except QuotaExhausted as e:
    partial_df, partial_md = e.partial_frame, e.partial_metadata
    # ...wait for hourly window to reset...
    rest_df, rest_md = get_daily(
        monitoring_location_id=sites,
        parameter_code="00060",
        time="P7D",
        resume_from=partial_md,
    )
    df = pd.concat([partial_df, rest_df], ignore_index=True)

Design

ChunkManifest is a frozen dataclass pinning the normalized cartesian-product plan and the completed count. Pinning the plan (not just the input args) lets resume detect when a caller has changed their inputs between the original call and the retry — same-looking args that chunk differently would silently re-fetch wrong sub-ranges. Resume rejects four invalid states with explicit error messages: no manifest on the metadata, current args don't chunk, current plan differs from the saved plan, or the saved manifest is already complete.

PartialResult is the new base exception; QuotaExhausted now subclasses it and carries the manifest instead of the bare completed_chunks/total_chunks counts of #280. Any sub-request exception (including PR #279's _walk_pages RuntimeError) is wrapped via raise PartialResult(...) from exc, so the original cause stays accessible via __cause__. On a first-chunk-failed scenario, the chunker synthesizes a minimal requests.Response carrying just the canonical URL + manifest so caller-side BaseMetadata.chunk_manifest always works.

The manifest is also attached to BaseMetadata.chunk_manifest on successful chunked calls so callers can log md.chunk_manifest to confirm fan-out and observe sub-request count.

Surface

  • resume_from: BaseMetadata | None = None added to all 11 chunked-getter signatures (get_daily, get_continuous, get_monitoring_locations, get_time_series_metadata, get_combined_metadata, get_latest_continuous, get_latest_daily, get_field_measurements, get_field_measurements_metadata, get_peaks, get_channel).
  • New module exports: ChunkManifest, PartialResult, _normalize_plan.

Tests

14 new chunker tests cover: manifest properties (total / is_complete / remaining), frozen-dataclass immutability and hashability, normalized-plan order sensitivity, manifest attached on successful chunked calls, no manifest on pass-through, BaseMetadata.chunk_manifest round-trip, PartialResult wrapping of fetch exceptions with __cause__ chained, empty-frame handling on first-chunk failure, partial_metadata lazy wrapping, resume happy path (skip completed chunks), and four resume rejection paths (mismatched plan / no manifest / already-complete / no chunking). Plus an end-to-end test that exhausts quota partway through, then resumes to complete the query and verifies frame concat reproduces a single-call equivalent.

All 59 chunker / utility tests pass; 42 chunker-specific tests focused.

thodson-usgs and others added 6 commits May 17, 2026 11:44
For multi-value waterdata queries (e.g. monitoring_location_id with
~300+ sites), the GET URL produced by PR DOI-USGS#233 blows past the server's
~8 KB nginx buffer and the API returns HTTP 414. This PR adds a
chunker that transparently splits long list params across sub-requests
so each URL fits the byte budget.

The chunker is a decorator applied to ``_fetch_once`` outside the
existing ``@filters.chunked`` (CQL chunker), so list-chunking is the
outer loop and filter-chunking is the inner loop:

  @chunking.multi_value_chunked(build_request=_construct_api_requests)
  @filters.chunked(build_request=_construct_api_requests)
  def _fetch_once(args): ...

Key design points:

- ``_plan_chunks`` greedy-halves the largest chunk across all
  dimensions until the worst-case sub-request fits ``url_limit``
  (URL + body, via ``_request_bytes``, so POST routes are sized
  correctly). Cartesian product of per-dim partitions becomes the
  sub-request set; capped at ``max_chunks=1000``.

- ``_filter_aware_probe_args`` coordinates with ``filters.chunked``:
  the planner probes URL length using a synthetic clause that matches
  the inner filter chunker's bail-floor size (longest single clause,
  scaled by worst-case URL encoding ratio). Without this coordination,
  the outer planner would raise ``RequestTooLarge`` on combinations
  the stacked chunkers can actually handle.

- ``QuotaExhausted`` mid-call guard reads ``x-ratelimit-remaining``
  after each sub-request; if it drops below ``quota_safety_floor=50``,
  the wrapper raises with the partial frame, completed-chunk offset,
  and last observed remaining quota — letting callers salvage or
  resume after the rate-limit window resets, rather than crash into a
  silent mid-pagination 429.

- ``RequestTooLarge`` is raised when the smallest reducible plan
  still exceeds ``url_limit`` (every multi-value param at a singleton
  chunk and any chunkable filter at the inner chunker's bail floor)
  or when the cartesian product exceeds ``max_chunks``.

- All defaults (``url_limit``, ``max_chunks``, ``quota_safety_floor``)
  resolve at call time, so monkey-patching ``filters._WATERDATA_URL_
  BYTE_LIMIT`` for tests / non-default quotas affects the decorator
  uniformly.

Public additions:

- ``dataretrieval.waterdata.chunking.multi_value_chunked``
- ``dataretrieval.waterdata.chunking.RequestTooLarge``
- ``dataretrieval.waterdata.chunking.QuotaExhausted`` (carries
  ``partial_frame``, ``partial_response``, ``completed_chunks``,
  ``total_chunks``, ``remaining``)

Tests (30 new):

- ``_filter_aware_probe_args`` worst-case-clause modelling
- ``_plan_chunks`` greedy halving, RequestTooLarge floor, filter-
  chunker coordination, ``max_chunks`` cap, lazy-default reads
- ``multi_value_chunked`` pass-through, cartesian-product shape,
  end-to-end with stacked filter chunker
- ``QuotaExhausted`` header parsing, mid-call abort, last-chunk no-
  abort, zero-floor disable
- ``RequestTooLarge`` message contents and triggering conditions

End-to-end correctness verified against the live API: identical
per-site cell-for-cell output between unchunked (single call) and
chunked (forced fan-out via patched limit) paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…queries

When a chunked OGC call fails partway through, the chunker now raises
PartialResult (or its QuotaExhausted subclass) carrying the combined
partial DataFrame plus a ChunkManifest recording how many sub-requests
of the cartesian-product plan completed. The same getter accepts the
partial metadata back via a new resume_from= kwarg; the chunker
validates the saved plan matches the fresh args and fetches only the
remaining combinations. Callers concatenate the saved partial frame
with the resume call's return value to reconstruct the full result.

- ChunkManifest dataclass (frozen, hashable) with plan / completed /
  total / is_complete / remaining
- PartialResult base exception with partial_frame, partial_response,
  manifest, and a lazy partial_metadata property; QuotaExhausted now
  subclasses it and carries the manifest instead of bare counts
- BaseMetadata.chunk_manifest exposes the manifest end-to-end
- multi_value_chunked wrapper validates resume_from, skips completed
  cartesian-product combinations via itertools.islice, wraps any
  sub-request exception as PartialResult with __cause__ preserved
- resume_from= added to all 11 chunked-getter signatures
- 14 new tests covering manifest properties, resume validation
  (mismatched plan / no manifest / already-complete / no chunking),
  PartialResult wrapping of fetch errors, end-to-end quota-exhaust-
  then-resume, and partial_metadata wrapping
Worked example in get_daily's docstring showing the canonical
PartialResult catch / accumulate-partials / sleep-and-retry pattern,
capped at a one-hour deadline matched to the API's hourly rate-limit
window so structural failures surface rather than spin forever.

Module-level multi_value_chunked docstring and the per-getter
resume_from parameter doc now point to get_daily for the example.
The example doesn't belong in get_daily's docstring — it's a topical
explanation of an API contract that applies to every chunked getter,
not a usage demo of one function. Move it to a dedicated Sphinx user-
guide page (waterdata_chunking.rst) covering the chunker's resume
contract, the canonical retry-loop pattern with a one-hour deadline,
the four resume-validation failure modes, and how to inspect the
chunk manifest on successful calls.

multi_value_chunked's module docstring and the per-getter
resume_from parameter doc now cross-reference the new page.
…tial

Both abort sites (quota-exhausted bail and sub-request exception) and
the success path now share one helper that combines responses, restores
the canonical URL, builds the manifest, and attaches it to the
response — eliminating the three near-identical inline blocks. Message
formatting moves to a single _partial_result_message() so the three
"Catch ... to access .partial_frame and resume" strings collapse to
one. The "resume_from" kwarg literal becomes _RESUME_FROM_KEY for
consistency with _QUOTA_HEADER, and the args-strip uses the standard
dict()+pop() idiom. PR-number references that would rot in public
docstrings dropped.
ChunkManifest, PartialResult, QuotaExhausted, and RequestTooLarge are
now available from ``dataretrieval.waterdata`` directly. Callers
following the resume retry-loop pattern no longer need to reach into
the ``dataretrieval.waterdata.chunking`` submodule to catch
PartialResult — the public import matches where the getters live.

multi_value_chunked's docstring now states explicitly that the wrapper
catches every ``Exception`` (not just the three named example types)
and that ``BaseException`` subclasses propagate unchanged, so callers
know KeyboardInterrupt aborts a chunked call cleanly while a
programmer-error TypeError still gets wrapped with its partial state.
The userguide example uses the new top-level import.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds resumable partial-result support for chunked OGC waterdata queries, building on the multi-value chunking and pagination failure handling work.

Changes:

  • Adds ChunkManifest, PartialResult, QuotaExhausted, and resume_from support for chunked calls.
  • Updates public waterdata getter signatures, metadata, docs, and NEWS.
  • Adds extensive chunking/resume tests.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
dataretrieval/waterdata/chunking.py Implements chunk planning, manifests, partial exceptions, quota handling, and resume behavior.
dataretrieval/waterdata/utils.py Wires multi-value chunking around OGC fetches and updates paginated response metadata aggregation.
dataretrieval/waterdata/filters.py Shares filter encoding-budget logic and restores canonical URLs/headers on filter-chunked responses.
dataretrieval/waterdata/api.py Adds resume_from to supported public getter signatures and docs.
dataretrieval/waterdata/__init__.py Exports new chunking-related public classes.
dataretrieval/utils.py Exposes chunk_manifest through BaseMetadata.
docs/source/userguide/waterdata_chunking.rst Adds user documentation for chunking and resume workflow.
docs/source/userguide/index.rst Adds the new chunking guide to the user-guide toctree.
NEWS.md Documents the resumable chunked-query behavior.
tests/waterdata_test.py Adds planner, manifest, partial-result, quota, and resume tests.
Comments suppressed due to low confidence (1)

dataretrieval/waterdata/chunking.py:708

  • QuotaExhausted.partial_frame is assembled before the public getter's normal post-processing runs, so callers receive raw _fetch_once columns/dtypes for the saved partial data but post-processed columns/dtypes from the successful resume call. This breaks the advertised concat-to-reconstruct workflow for real getters; apply the same get_ogc_data post-processing to the exception's partial frame before it reaches callers, or raise the partial exception from a layer that already has processed frames.
                        raise QuotaExhausted(
                            partial_frame=partial_frame,
                            partial_response=partial_response,
                            manifest=manifest,
                            remaining=remaining,
                        )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread NEWS.md
@@ -1,3 +1,7 @@
**05/17/2026:** Chunked `waterdata` calls that fail partway through are now resumable. Any sub-request failure (quota exhaustion, mid-pagination 5xx/429, transport error) raises `PartialResult` (or its `QuotaExhausted` subclass) carrying the combined partial DataFrame, a `BaseMetadata.partial_metadata` accessor, and a `ChunkManifest` recording how many sub-requests of the cartesian-product plan completed. The same getter accepts the partial metadata via a new `resume_from=` kwarg; the chunker validates the saved plan matches the fresh args and fetches only the remaining cartesian-product combinations. Callers concatenate their saved partial DataFrame with the resume call's return value to reconstruct the full result. The manifest is also attached to `BaseMetadata.chunk_manifest` on successful chunked calls for observability.
Comment on lines +683 to +688
raise PartialResult(
partial_frame=partial_frame,
partial_response=partial_response,
manifest=manifest,
cause=exc,
) from exc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants