Skip to content

[KYUUBI #7433][SPARK] Preserve Hive delegation tokens with non-empty service#7518

Open
Sunwoo-Shin wants to merge 1 commit into
apache:masterfrom
Sunwoo-Shin:kyuubi-7433-multi-hms-hive-token
Open

[KYUUBI #7433][SPARK] Preserve Hive delegation tokens with non-empty service#7518
Sunwoo-Shin wants to merge 1 commit into
apache:masterfrom
Sunwoo-Shin:kyuubi-7433-multi-hms-hive-token

Conversation

@Sunwoo-Shin

Copy link
Copy Markdown

Why are the changes needed?

Closes #7433.

SparkTBinaryFrontendService#addHiveToken keeps only the Hive delegation token whose service field is empty (the one HiveMetaStoreClient selects by default) and silently drops every Hive token whose service is non-empty.

A Hive delegation token gets a non-empty service when it is bound to a specific metastore via hive.metastore.token.signature (the signature is stored in the token service). When an engine talks to multiple Hive metastores that use different Kerberos principals — e.g. two Iceberg catalogs, each backed by its own HMS — each metastore produces its own signature-bound token. Because these tokens are dropped before reaching the engine UGI, the engine fails to authenticate against the non-default metastore with DIGEST-MD5: IO error acquiring password.

This is the engine-side counterpart of #1091 (renewing delegation tokens for multiple Hive metastore clusters): even when the server pushes per-metastore tokens, the engine drops them.

Affected versions: 1.11.1.

This change partitions the incoming Hive tokens by their service field:

  • Tokens with a non-empty service are added to the engine credentials keyed by their alias, reusing the same issue-date downgrade guard the default path already applies.
  • The existing single-metastore URI matching now runs only over the default (empty-service) tokens, so behavior for the common single-HMS case is unchanged.
  • The No matching Hive token found ... warning is emitted only when a default-service token was actually expected, to avoid noise in metastore deployments that rely solely on signature-bound tokens.

How was this patch tested?

Added unit tests in SparkTBinaryFrontendServiceSuite (the token-merging logic was extracted into mergeHiveTokens so it can be exercised without a SparkContext):

  • signature-bound tokens for multiple metastores are preserved, keyed by alias;
  • a signature-bound token with an earlier issue date is ignored, and a later one replaces the existing token;
  • the existing single-metastore matching for default-service tokens is unchanged;
  • signature-bound tokens are added without disturbing the default-service path.
build/mvn test -pl externals/kyuubi-spark-sql-engine -am \
  -Dtest=none \
  -DwildcardSuites=org.apache.kyuubi.engine.spark.SparkTBinaryFrontendServiceSuite

Was this patch authored or co-authored using generative AI tooling?

Assisted-by: Claude:claude-opus-4-8

…empty service

addHiveToken dropped Hive tokens whose service is non-empty, breaking auth
against non-default metastores in multi-HMS setups. Partition tokens on the
service field and add the signature-bound ones by alias.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] SparkTBinaryFrontendService#addHiveToken silently drops Hive delegation tokens with non-empty service field (multi-HMS scenario)

1 participant