Skip to content

REST Spec: add batch load endpoints for tables and views#15528

Closed
stevenzwu wants to merge 1 commit intoapache:mainfrom
stevenzwu:rest-batch-load-endpoint
Closed

REST Spec: add batch load endpoints for tables and views#15528
stevenzwu wants to merge 1 commit intoapache:mainfrom
stevenzwu:rest-batch-load-endpoint

Conversation

@stevenzwu
Copy link
Copy Markdown
Contributor

@stevenzwu stevenzwu commented Mar 8, 2026

Create this draft PR. thought it is easier to review and comment than pasting the content in the google doc: https://docs.google.com/document/d/1VW5hgaaajRWtp5KbOU3s83YyoyPi5WOSvHtoJ_yXzJs/edit?tab=t.0#heading=h.e6w7vgpr8t2f

Used AI review that fixed some minor wording problems and validation errors.

@stevenzwu stevenzwu force-pushed the rest-batch-load-endpoint branch 3 times, most recently from 186e497 to 5727a90 Compare March 8, 2026 05:32
@stevenzwu stevenzwu marked this pull request as ready for review March 8, 2026 06:35
description: Identifies a unique version of the table metadata
type: string
result:
$ref: '#/components/schemas/LoadTableResult'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LoadTableResult would be both TableMetadata + Creds which i think in batch context is very expensive call (even discounting credential cache)

storage-credentials:

is it possible to decouple creds from this ? i understand we might eventually need it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the client doesn't set the access delegation header, the server shouldn't need to return credentials. that would be the same btw single and batch load endpoints.

Copy link
Copy Markdown
Contributor

@singhpk234 singhpk234 Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, but what if it sends ? i understand even if sends the header server has choice not to vend creds, but do we wanna ever vend creds in this batch endpoint at per table level is my question, because this in worst case is remote STS call to object store provider and this can easily timeout

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the batch endpoint should allow credential vending. E.g., if a query reference a few tables, the batch load should return the metadata along with vended credentials for those tables.

Copy link
Copy Markdown
Contributor Author

@stevenzwu stevenzwu Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@singhpk234 me and @flyrain chatted more about this. A catalog server can limit the size of the computation it wants to perform for the batch request, and move unprocessed items to the unprocessed list in the response payload for clients to retry later. That should alleviate the concern on the cost of credential vending. Basically, a catalog server has full control on the amount of work it chooses to do.

stevenzwu added a commit to stevenzwu/iceberg that referenced this pull request Mar 17, 2026
Implement the Java side of the batch load REST endpoints
(POST /v1/{prefix}/tables/batch-load and views/batch-load).

This includes request/response models, custom JSON parsers,
serializer registration, routing, server-side handlers in
CatalogHandlers, RESTCatalogAdapter routing, and client-side
methods in RESTSessionCatalog returning CloseableIterable with
automatic retry of unprocessed tables.

The OpenAPI spec changes are tracked in a separate PR (apache#15528).

Made-with: Cursor
Model: claude-4.6-opus-high-thinking
stevenzwu added a commit to stevenzwu/iceberg that referenced this pull request Mar 17, 2026
Implement the Java side of the batch load REST endpoints
(POST /v1/{prefix}/tables/batch-load and views/batch-load).

This includes request/response models, custom JSON parsers,
serializer registration, routing, server-side handlers in
CatalogHandlers, RESTCatalogAdapter routing, and client-side
methods in RESTSessionCatalog returning CloseableIterable with
automatic retry of unprocessed tables.

The OpenAPI spec changes are tracked in a separate PR (apache#15528).

Made-with: Cursor
Model: claude-4.6-opus-high-thinking
@stevenzwu stevenzwu force-pushed the rest-batch-load-endpoint branch 4 times, most recently from 2ff370f to 1b5a4da Compare March 17, 2026 22:41
Add POST /v1/{prefix}/tables/batch-load and
POST /v1/{prefix}/views/batch-load endpoints.

Extract the inline snapshots query parameter into a shared
components/parameters/snapshots definition, referenced by both
the single table load and batch load endpoints.

Made-with: Cursor
Model: claude-4.6-opus-high-thinking
Made-with: Cursor
@stevenzwu stevenzwu force-pushed the rest-batch-load-endpoint branch from 1b5a4da to 5032ba7 Compare March 17, 2026 22:58
operationId: batchLoadTables
parameters:
- $ref: '#/components/parameters/data-access'
- $ref: '#/components/parameters/snapshots'
Copy link
Copy Markdown
Contributor Author

@stevenzwu stevenzwu Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a query parameter in the single table load endpoint (POST). adopted the same approach for the batch load endpoint (GET).

query parameter for http post seems like an acceptable practice.

@stevenzwu
Copy link
Copy Markdown
Contributor Author

deprecated by the new PR: #15830

@stevenzwu stevenzwu closed this Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants