REST Spec: add batch load endpoints for tables and views#15528
REST Spec: add batch load endpoints for tables and views#15528stevenzwu wants to merge 1 commit intoapache:mainfrom
Conversation
186e497 to
5727a90
Compare
| description: Identifies a unique version of the table metadata | ||
| type: string | ||
| result: | ||
| $ref: '#/components/schemas/LoadTableResult' |
There was a problem hiding this comment.
LoadTableResult would be both TableMetadata + Creds which i think in batch context is very expensive call (even discounting credential cache)
iceberg/open-api/rest-catalog-open-api.yaml
Line 3500 in e947750
is it possible to decouple creds from this ? i understand we might eventually need it.
There was a problem hiding this comment.
If the client doesn't set the access delegation header, the server shouldn't need to return credentials. that would be the same btw single and batch load endpoints.
There was a problem hiding this comment.
Agree, but what if it sends ? i understand even if sends the header server has choice not to vend creds, but do we wanna ever vend creds in this batch endpoint at per table level is my question, because this in worst case is remote STS call to object store provider and this can easily timeout
There was a problem hiding this comment.
I think the batch endpoint should allow credential vending. E.g., if a query reference a few tables, the batch load should return the metadata along with vended credentials for those tables.
There was a problem hiding this comment.
@singhpk234 me and @flyrain chatted more about this. A catalog server can limit the size of the computation it wants to perform for the batch request, and move unprocessed items to the unprocessed list in the response payload for clients to retry later. That should alleviate the concern on the cost of credential vending. Basically, a catalog server has full control on the amount of work it chooses to do.
Implement the Java side of the batch load REST endpoints
(POST /v1/{prefix}/tables/batch-load and views/batch-load).
This includes request/response models, custom JSON parsers,
serializer registration, routing, server-side handlers in
CatalogHandlers, RESTCatalogAdapter routing, and client-side
methods in RESTSessionCatalog returning CloseableIterable with
automatic retry of unprocessed tables.
The OpenAPI spec changes are tracked in a separate PR (apache#15528).
Made-with: Cursor
Model: claude-4.6-opus-high-thinking
Implement the Java side of the batch load REST endpoints
(POST /v1/{prefix}/tables/batch-load and views/batch-load).
This includes request/response models, custom JSON parsers,
serializer registration, routing, server-side handlers in
CatalogHandlers, RESTCatalogAdapter routing, and client-side
methods in RESTSessionCatalog returning CloseableIterable with
automatic retry of unprocessed tables.
The OpenAPI spec changes are tracked in a separate PR (apache#15528).
Made-with: Cursor
Model: claude-4.6-opus-high-thinking
2ff370f to
1b5a4da
Compare
Add POST /v1/{prefix}/tables/batch-load and
POST /v1/{prefix}/views/batch-load endpoints.
Extract the inline snapshots query parameter into a shared
components/parameters/snapshots definition, referenced by both
the single table load and batch load endpoints.
Made-with: Cursor
Model: claude-4.6-opus-high-thinking
Made-with: Cursor
1b5a4da to
5032ba7
Compare
| operationId: batchLoadTables | ||
| parameters: | ||
| - $ref: '#/components/parameters/data-access' | ||
| - $ref: '#/components/parameters/snapshots' |
There was a problem hiding this comment.
this is a query parameter in the single table load endpoint (POST). adopted the same approach for the batch load endpoint (GET).
query parameter for http post seems like an acceptable practice.
|
deprecated by the new PR: #15830 |
Create this draft PR. thought it is easier to review and comment than pasting the content in the google doc: https://docs.google.com/document/d/1VW5hgaaajRWtp5KbOU3s83YyoyPi5WOSvHtoJ_yXzJs/edit?tab=t.0#heading=h.e6w7vgpr8t2f
Used AI review that fixed some minor wording problems and validation errors.