Skip to content

feat(client): scaffolding for stateless requests-proxy auth (HC-1)#100

Open
saadqbal wants to merge 1 commit intodevelopfrom
feat/requests-proxy-stateless-secret-rbac
Open

feat(client): scaffolding for stateless requests-proxy auth (HC-1)#100
saadqbal wants to merge 1 commit intodevelopfrom
feat/requests-proxy-stateless-secret-rbac

Conversation

@saadqbal
Copy link
Copy Markdown
Contributor

@saadqbal saadqbal commented May 5, 2026

Summary

Chart-side scaffolding for the stateless requests-proxy auth path. Companion to client-runtime#26 (CR-1, proxy verify) and client-runtime#27 (CR-2, jobs-manager mint). Cutover is tracked in #99 (OPS-1).

  • New <release>-requests-proxy-keys Secret holds HMAC keys (active, v1, …). lookup-preserved across upgrades and helm.sh/resource-policy: keep — losing it invalidates every JWT held by running training pods.
  • New <release>-requests-proxy-revoked ConfigMap holds the revoked-jti list. Same preservation pattern; without lookup, every helm upgrade would silently re-authorize previously-stopped jobs.
  • RBAC rules (added to both clusterScope paths in rbac.yaml) are resourceNames-scoped to those two specific objects so jobs-manager can't touch unrelated configmaps/secrets in the namespace.
  • Both deployments mount the keys Secret at /etc/proxy/keys/. The proxy also mounts the revoked ConfigMap at /etc/proxy/revoked/. New env vars: REQUESTS_PROXY_STATELESS, RELEASE_NAME, NAMESPACE, plus REQUESTS_PROXY_TOKEN_TTL_SECONDS and REVOKED_CONFIGMAP_NAME on jobs-manager.
  • New values: requestsProxy.statelessTokens (default false) and requestsProxy.tokenTtlSeconds (default 7776000 = 90d, aligned to the quarterly key-rotation cadence).

The runtime path is gated entirely by the requestsProxy.statelessTokens env var on the pods — chart-side resources always render so OPS-1 is a one-line helm upgrade --set operation, not a chart-shape change.

Test plan

  • helm template clean against default values and --set requestsProxy.statelessTokens=true
  • helm lint clean
  • helm unittest ./client127/127 passing, including 9 new cases in requests_proxy_stateless_test.yaml covering the keys Secret, revoked ConfigMap, RBAC under both clusterScope=true and clusterScope=false, and both deployments.
  • After CR-1 + CR-2 merge in client-runtime: end-to-end smoke in staging — helm upgrade --set requestsProxy.statelessTokens=true, kill the proxy pod, verify in-flight training pods continue without 401s. Tracked under OPS-1 (ops: enable stateless requests-proxy in staging then prod #99).

Closes #96


Note

Medium Risk
Adds new Helm-managed Secret/ConfigMap plus new RBAC and pod mounts/env vars; while gated by a flag, mistakes could break proxy authentication or leave behind persistent auth artifacts due to resource-policy: keep. Changes touch deployment manifests and permissions but not application logic.

Overview
Introduces chart scaffolding for stateless requests-proxy auth by adding a persistent <release>-requests-proxy-keys Secret (lookup-preserved, helm.sh/resource-policy: keep) and a <release>-requests-proxy-revoked ConfigMap to store revoked token IDs.

Updates jobs-manager and requests-proxy Deployments to mount these resources and inject new env vars (including REQUESTS_PROXY_STATELESS gated by requestsProxy.statelessTokens, plus release/namespace metadata and JWT TTL/revoked ConfigMap name). RBAC is extended (both cluster- and namespace-scoped paths) with resourceName-scoped access to read the keys Secret and update the revoked ConfigMap.

Adds requestsProxy.statelessTokens and requestsProxy.tokenTtlSeconds to values.yaml/schema and includes Helm unittest coverage for the new templates and wiring.

Reviewed by Cursor Bugbot for commit 4974fb5. Bugbot is set up for automated code reviews on this repo. Configure here.

Adds the chart-side resources required by the stateless JWT auth path
that CR-1 (proxy verify) and CR-2 (jobs-manager mint) land in
client-runtime: a multi-version HMAC keys Secret, a revoked-jti
ConfigMap, RBAC rules scoped to those two resources via resourceNames,
and the deployment mounts/env both pods need to find them.

Both new resources are rendered with helm.sh/resource-policy: keep, and
both use lookup() to preserve existing data across helm upgrade — losing
the keys Secret invalidates every JWT held by running training pods,
and resetting the revoked list silently re-authorizes previously-stopped
jobs.

The runtime path is gated by requestsProxy.statelessTokens (default
false). Flipping the flag is OPS-1.

Refs #96
@saadqbal saadqbal self-assigned this May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants