Skip to content

feat(rest): support vended storage credentials with per-prefix FileIO#2651

Open
plusplusjiajia wants to merge 3 commits into
apache:mainfrom
plusplusjiajia:feat/rest-vended-credentials
Open

feat(rest): support vended storage credentials with per-prefix FileIO#2651
plusplusjiajia wants to merge 3 commits into
apache:mainfrom
plusplusjiajia:feat/rest-vended-credentials

Conversation

@plusplusjiajia

@plusplusjiajia plusplusjiajia commented Jun 15, 2026

Copy link
Copy Markdown
Member

What

load_table / create_table now use the storage_credentials returned in
LoadTableResult to build a per-prefix FileIO. The field was exposed (its doc
says clients should prefer it over config) but never consumed.

Changes

  • Build one storage per credential prefix instead of merging all credentials into
    a single props map: FileIO routes each path to the longest-matching prefix
    (new FileIOBuilder::with_prefixed_props), so multi-location tables use the
    right credentials. Paths matching no prefix fall back to the default storage —
    same behavior as Java's S3FileIO.clientForStoragePath. delete_stream groups
    paths by routed storage.
  • Requesting delegation is opt-in via the existing header.* catalog config
    (e.g. header.X-Iceberg-Access-Delegation=vended-credentials), applied to every
    request like Java's RESTUtil.configHeaders — no dedicated prop, no hardcoded
    default.
  • update_table no longer reloads after commit; since CommitTableResponse
    carries no credentials, the transaction layer reuses the credentialed FileIO
    it already holds (new Table::with_file_io).

@plusplusjiajia plusplusjiajia force-pushed the feat/rest-vended-credentials branch from ac46430 to 894d5cf Compare June 15, 2026 12:01

@xanderbailey xanderbailey left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, I had a quick look and left some comments / questions.

Comment thread crates/catalog/rest/src/catalog.rs Outdated
.client
.request(Method::GET, context.config.table_endpoint(table_ident))
// Opt in to vended storage credentials.
.header("X-Iceberg-Access-Delegation", "vended-credentials")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java will read these headers off the catalog config, should we do the same?

header.X-Iceberg-Access-Delegation

Link to Java

@plusplusjiajia plusplusjiajia Jun 15, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xanderbailey Thanks for the catch. Reworked to match the Java client: dropped the dedicated prop and rely on the existing generic header.* config (extra_headers). Vended credentials are opt-in via header.X-Iceberg-Access-Delegation=vended-credentials, applied to all requests like Java's RESTUtil.configHeaders — no hardcoded default.

Comment thread crates/catalog/rest/src/catalog.rs Outdated
let file_io = self
.load_file_io(Some(&response.metadata_location), None)
.await?;
.load_table(commit.identifier())

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A full load table seems heavy to me. Can we not wire in the pre-credentialed FileIO into this method somehow?

@plusplusjiajia plusplusjiajia Jun 15, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xanderbailey Good point — update_table no longer reloads. It builds FileIO from the commit response, and the transaction layer reuses the credentialed FileIO it already holds (do_commit + new Table::with_file_io). Back to one GET per commit.

Comment thread crates/catalog/rest/src/catalog.rs Outdated
}

/// FileIO props: server `config`, then vended `storage_credentials` (longest prefix wins), then user props.
fn table_file_io_config(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also notice that Java constructs clientByPrefix Link which keeps credentials separate per prefix. So we can't support multi-location tables here I think? Also it would silently fail in those cases which I don't think is ideal? WDYT?

@plusplusjiajia plusplusjiajia Jun 15, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xanderbailey Thanks, agreed. Each vended credential now maps to its own storage instead of merging all into one props map. FileIO routes each path to the longest-matching prefix(with_prefixed_props/get_storage); removed table_file_io_config.
On silent failures: this matches Java's S3FileIO.clientForStoragePath — a path matching no credential prefix falls back to the default storage from base properties (longest-prefix-match), so a real access problem surfaces as a normal I/O error, not a silent no-op.

@plusplusjiajia plusplusjiajia force-pushed the feat/rest-vended-credentials branch from 61ed31d to 9591c53 Compare June 16, 2026 02:04
@plusplusjiajia plusplusjiajia changed the title feat(rest): use vended storage credentials from LoadTable response feat(rest): support vended storage credentials with per-prefix FileIO Jun 16, 2026
@plusplusjiajia plusplusjiajia force-pushed the feat/rest-vended-credentials branch from 9591c53 to d5720b7 Compare June 16, 2026 02:33
huan233usc added a commit to huan233usc/iceberg-rust that referenced this pull request Jun 16, 2026
…redentials apache#2651)

Client implementation of the REST scan-planning protocol, reworked to build on
the vended-credential support from apache#2651 instead of duplicating it.

- Reuses `FileIOBuilder::with_prefixed_props` (from apache#2651) to build a plan-scoped
  FileIO from the `storage-credentials` the server vends in the plan response;
  no separate credential-resolution machinery.
- Narrow `ScanPlanner` capability trait + `Table` handle; `plan_files()`
  delegates and falls back to native planning on `FeatureUnsupported`.
- Endpoint negotiation, submit/poll/fetch state machine, wire→FileScanTask
  conversion. DataFusion unchanged.

Stacked on apache#2651 (vended storage credentials); will rebase to drop those commits
once it merges.
}

// Otherwise group paths by routed storage, then batch-delete each group.
let mut groups: HashMap<String, Vec<String>> = HashMap::new();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This collects the entire stream before deleting. Since delete_stream is used by purge/cleanup paths, large tables can make this unbounded in memory.

Can we keep routing per prefix but flush bounded batches as we iterate, instead of materializing all paths first?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huan233usc Good catch, thanks. Reworked to flush bounded per-prefix batches as we iterate instead of materializing everything — memory's now bounded regardless of stream size, mirroring Java's S3FileIO.deleteFiles. Added a test crossing the batch boundary.

for cred in creds {
let mut prefixed = props.clone();
prefixed.extend(cred.config.clone());
builder = builder.with_prefixed_props(cred.prefix.clone(), prefixed);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: do we plan to handle credential refresh as follow up?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huan233usc Good question!— not in this PR. FileIO is built once from the LoadTable creds, so no refresh yet. Good follow-up: the opendal S3 backend already has a ProvideCredential hook (reqsign re-invokes on expires_in), so a provider re-fetching from loadCredentials before s3.session-token-expires-at-ms — like Java's VendedCredentialsProvider — fits there. Will track as a separate PR.

@plusplusjiajia plusplusjiajia force-pushed the feat/rest-vended-credentials branch from 00a51e4 to 47748e3 Compare June 17, 2026 09:46

let file_io = self.load_file_io(Some(metadata_location), None).await?;
let file_io = self
.load_file_io(Some(metadata_location), None, None)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth check if e2e for register table requires plumbing credentials. But it could be a separate pr if needed.

@huan233usc huan233usc left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants