feat(catalog-rest): client support for server-side REST scan planning#2656
Draft
huan233usc wants to merge 2 commits into
Draft
feat(catalog-rest): client support for server-side REST scan planning#2656huan233usc wants to merge 2 commits into
huan233usc wants to merge 2 commits into
Conversation
…anPlanner injection) Implements the client side of the Iceberg REST scan-planning protocol (planTableScan / fetchPlanningResult / fetchScanTasks) so that, when a REST catalog advertises the planning endpoints, table scans delegate planning to the server instead of reading manifests locally. Injection design — Variant B (narrow capability trait): - New `iceberg::scan::ScanPlanner` trait + `ScanPlanRequest`; `Table`/ `TableScanBuilder` gain an optional `Arc<dyn ScanPlanner>`; `TableScan:: plan_files()` delegates to it and falls back to native planning on `FeatureUnsupported`. The core `Catalog` trait is left untouched. REST crate: - `scan_planning` module: wire DTOs, endpoint negotiation (parses the `endpoints` field of /v1/config and gates the scan-plan calls), the async submit/poll/fetch state machine with exponential backoff and best-effort Drop-based cancel, and conversion of wire content-files into FileScanTasks via their public builders (no DataFile internals needed). - `RestScanPlanner` is attached to every table the catalog returns. - Per-task row predicate is the client's own bound scan filter (correct), and the scan filter is pushed down as Iceberg expression JSON when encodable. Tested: DTO/endpoint/expr unit tests, conversion, and end-to-end mockito tests for completed-inline, submitted-then-polled, and plan-task fan-out paths. DataFusion needs no changes.
…ning
Builds on the Variant B server-side scan planning so a server-planned scan can
actually read its data files end-to-end against Unity Catalog FGAC tables
(verified live: server applies column masks, client reads the masked rows).
- Vended storage credentials: REST `storage-credentials` (from load-table and
from scan-plan responses) are attached to a `FileIO` via the new
`FileIOBuilder::with_storage_credentials` / `StorageConfig` credential support,
resolved per object path (longest-prefix) by `OpenDalResolvingStorageFactory`.
- R10 (plan-scoped credentials): `ScanPlanner::plan_table_scan` now returns
`ServerScanPlan { tasks, file_io }`. The REST planner builds a plan-scoped
`FileIO` from the credentials the server vends in the plan/poll responses, and
`TableScan::to_arrow` reads through it. Server-planned tables typically vend
credentials only in the plan response, not at load-table time.
- PlanStatus / content type now also accept the SCREAMING_CASE forms UC emits
(e.g. `COMPLETED`, `DATA`) in addition to the kebab-case spec forms.
Note: UC's scan planning currently rejects `case-sensitive=true`; callers must
build scans with `with_case_sensitive(false)` against such servers.
90ca4a2 to
551d64a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Part of #1690 — client support for REST server-side scan planning.
What changes are included in this PR?
A client implementation of the REST scan-planning protocol (
planTableScan/fetchPlanningResult/fetchScanTasks). When a catalog advertises the planningendpoints, a table scan delegates planning to the server and consumes the
returned
FileScanTasks; otherwise it transparently falls back to nativeclient-side planning.
Design
The feature hangs off a single seam —
TableScan::plan_files()— so execution(
to_arrow, the Arrow reader) is untouched and DataFusion needs no changes.Components
ScanPlannercapability trait(
crates/iceberg/src/scan/planner.rs);Tablecarries an optionalArc<dyn ScanPlanner>;plan_files()delegates and falls back onFeatureUnsupported. The coreCatalogtrait is untouched.CatalogConfigparses theendpointsfield ofGET /v1/config; scan-plan calls are gated byEndpoint::check.lean content-file shape.
fetchScanTasks,with best-effort cancel on drop.
FileScanTaskvia the public builders;the scan's own bound filter is the per-task predicate, pushed down as Iceberg
expression JSON when losslessly encodable.
ScanPlanner::plan_table_scanreturnsServerScanPlan { tasks, file_io }; the planner builds the plan-scopedFileIOfrom the vendedstorage-credentialsusing feat(rest): support vended storage credentials with per-prefix FileIO #2651'sFileIOBuilder::with_prefixed_props(no duplicate credential machinery).Alternative injection design
The same feature with planning placed on the
Catalogtrait(
Catalog::plan_table_scan,TableholdingArc<dyn Catalog>) is inhuan233usc#2 for comparison. This PR uses
the narrow-trait design because it keeps the central
Catalogtrait minimal andavoids giving every
Tablea back-reference to the full catalog.Future work (design feedback welcome)
Native planning stays inline in
plan_files()here; a follow-up could generalizeScanPlannerinto the single planning abstraction (aNativeScanPlannerpeer +a
FallbackScanPlannerthat composes server→native), makingplan_files()aone-liner with no branch. Deferred to keep this PR focused.
Are these changes tested?
Yes — unit tests for the wire DTOs, endpoint codec, and expression-JSON
serialization, conversion tests, and end-to-end
mockitotests covering thecompleted-inline, submitted-then-polled, and recursive plan-task fan-out paths.