Migrate CLI OAuth flow to pkg/auth/dcr resolver#5250
Conversation
There was a problem hiding this comment.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
tgrunnagle
left a comment
There was a problem hiding this comment.
Multi-Agent Consensus Review
Agents consulted: oauth-security, go-correctness, architecture, concurrency, test-coverage, general-code-quality (codex cross-review skipped — CLI not installed)
Consensus Summary
| # | Finding | Consensus | Severity | Action |
|---|---|---|---|---|
| 1 | PublicClient absent from singleflight + cache keys |
9/10 | MEDIUM | Discuss |
| 2 | applyExplicitEndpointOverrides bypasses URL validation |
7/10 | MEDIUM | Fix |
| 3 | Two-call consumeResolution/applyResolutionToOAuth2Config invariant not type-enforced |
8/10 | MEDIUM | Discuss |
| 4 | SanitizeErrorForLog is http(s)-only — redis-go error chain |
7/10 | MEDIUM | Discuss |
| 5 | DCRKey.Issuer doc contradicts new CLI usage |
8/10 | MEDIUM | Fix |
| 6 | _ = store.Close() swallows error without comment/log |
7/10 | LOW | Fix |
| 7 | TestNewDCRRequest missing env-var IAT branch |
7/10 | LOW | Fix |
| 8 | TestConsumeResolution does not pin nil-input no-op |
6/10 | LOW | Optional |
| 9 | LogStepError "upstream" attribute shape varies per consumer |
7/10 | LOW | Discuss |
| 10 | Direct-registration + PublicClient=true error msg is misleading |
6/10 | LOW | Optional |
| 11 | Wall-clock ClientSecretExpiresAt comparison lacks skew tolerance |
6/10 | LOW | Optional |
| 12 | PR body still reads "DRAFT - not ready for review" | 7/10 | LOW | Fix |
| 13 | Option-(b) rationale duplicated across two function docs | 6/10 | LOW | Optional |
Overall
This is a well-executed migration of the CLI OAuth flow onto the shared pkg/auth/dcr resolver. The package boundary is structurally clean (resolver no longer imports any authserver/discovery types — verified), the new dcr.Request envelope is genuinely profile-neutral, and the new inherited-property tests (S256 gate, redirect refusal, singleflight dedup) pin exactly the security-relevant behaviours that motivated the migration. The PR description, doc comments, and orientation prose are unusually good — a cold reviewer can pick up the architectural picture in a single pass.
The consensus findings cluster around one theme: correctness invariants encoded in comments rather than types. The singleflight + cache keys do not include PublicClient, the two-call resolution invariant relies on review, the sanitiser is scheme-specific despite the generic name, and DCRKey.Issuer's field doc still claims "local issuer ... NOT the upstream's" even though the CLI now puts the upstream issuer there. Each one is a small refactor away from being structural. Current call sites do not exhibit the failure modes by construction; the risk is forward-looking, not active.
None of the findings rise to a hard blocker. Recommended priorities are #1 (PublicClient in keys), #3 (two-call consolidation), and #5 (DCRKey doc fix) — all cheap, all close defense-in-depth gaps the author already calls out in inline comments.
For consistency, please note: the authenticated reviewer and PR author are the same user, so this review uses COMMENT rather than REQUEST_CHANGES — this is not a draft-blocker assessment, it's an artefact of GitHub's policy.
Additional findings (file not in this PR's diff, or line outside diff hunks)
- [MEDIUM]
DCRKey.Issuerdoc drift —pkg/authserver/storage/types.go:122-130documentsIssueras "the local issuer of the embedded authorization server that performed the registration, NOT the upstream's". The new CLI call site atpkg/auth/discovery/discovery.go:712-713puts the upstream's issuer URL there. The newdcr.Request.Issuerdoc accommodates both consumers, but the underlying storage type's field doc still claims the old single-consumer semantic. UpdateDCRKey.Issuerto reflect the dual semantic, e.g. "For the embedded authserver this is its own (local) issuer; for the CLI flow this is the upstream's issuer because the CLI has no separate logical issuer of its own. The cache key (Issuer, RedirectURI, ScopesHash) keeps the two consumers' entries apart via the RedirectURI component." - [MEDIUM]
SanitizeErrorForLogis http(s)-only —pkg/auth/dcr/resolver.go:481-538(outside this PR's diff hunks). The function name reads generic, but the implementation only strips userinfo/query/fragment fromhttp(s)://URLs. On the embedded-authserver path,cache.Get/Puterrors fromstorage/redis.goflow intoLogStepError. If a future operator (or redis-go release) puts credentials in a sentinel/cluster URL, those credentials hit the slog.Error attribute. Either rename tosanitizeHTTPURLsso the scope is unambiguous, or extend the regex to also matchredis(s)?://. Add a small test asserting a redis-style error string with credentials is sanitised. - [LOW]
LogStepError"upstream" attribute is inconsistent per consumer —pkg/auth/dcr/resolver.go:452-479(outside this PR's diff hunks). Embedded-authserver path passesrc.Name(operator-defined short name); CLI path passesissuer(upstream issuer URL). Operators reading mixed logs see two different shapes under the same key. Either rename the parameter toupstreamIDand document it as opaque, or acceptslog.Attrpairs so callers can label ("upstream_name"vs"upstream_issuer"). - [LOW]
ClientSecretExpiresAtwall-clock comparison lacks skew tolerance —pkg/auth/dcr/resolver.go:641(outside this PR's diff hunks).time.Now().After(cached.ClientSecretExpiresAt)is exact. NTP step on resume or AS-clock-ahead skew leads to either serving an expired secret (already broken at the AS) or refetching too early. Considertime.Now().Add(30 * time.Second).After(cached.ClientSecretExpiresAt)as defense-in-depth. - [LOW] PR body still reads "DRAFT - not ready for review" — The PR is in draft state and the body header says "DRAFT - not ready for review". Since you are explicitly requesting a review now, either flip out of draft or remove the header so other reviewers landing on this PR aren't confused about its status.
Dropped findings appendix (consensus < 7)
The following observations were raised by individual agents but did not meet the consensus threshold. Listed for auditability only — no action requested:
- Regex
(?i)https?://[^\s"']+can match across a comma (security LOW, score 5). Failure mode is "did not sanitise", not corrupted output. selectTokenEndpointAuthMethodwith emptyserverSupported+publicClient=truetrusts upstream to accept "none" (security LOW, score 5).CloseableCredentialStoreloses Close capability when widened to base interface (architecture LOW, score 6). No occurrence in this PR.- Hardcoded
CallbackPort: 8765in CLI tests (test INFO, score 5). Tests don't bind the port — acceptable today. dcr.Keyis generically named (quality INFO, score 5).flightKeyOf(key)is a free function where a method would read more idiomatically (quality LOW, score 5).- Long doc comments on
Request/Resolutionpartially restate code (quality LOW, score 5). handleDynamicRegistrationdoc preamble is 26 lines for a 45-line body (quality LOW, score 5).ResolveCredentialstakes*Requestby pointer; could take by value (architecture LOW, score 5).PublicClient booltri-state suggestion to fail-loud on unset (security INFO, score 5).- CLI cross-invocation expiry refetch not in resolver's loop (architecture INFO, score 6) — explicitly documented as option-(b) trade-off.
Generated with Claude Code
fd66a1f to
1b5aa83
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5250 +/- ##
==========================================
+ Coverage 68.17% 68.20% +0.02%
==========================================
Files 618 618
Lines 63146 63212 +66
==========================================
+ Hits 43049 43112 +63
- Misses 16884 16886 +2
- Partials 3213 3214 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Responding to the review-body findings (no inline anchor):
|
Addresses #5250 review comments: - MEDIUM pkg/auth/dcr/resolver.go (3221902433): include PublicClient in storage.DCRKey, flightKeyOf, and the Redis key serialisation so public and confidential registrations cannot share a cache entry under any Issuer/RedirectURI/ScopesHash collision. - MEDIUM pkg/auth/dcr/resolver.go (3221902453): apply validateUpstreamEndpointURL to explicit authorization_endpoint / token_endpoint overrides; propagate the error via dcrStepMetadata. - MEDIUM pkg/authserver/runner/dcr_adapter.go (3221902473): add a tripwire test that pins the two-call invariant — calling only consumeResolution must leave the built upstream.OAuth2Config's ClientSecret empty. - MEDIUM pkg/authserver/storage/types.go (body): rewrite DCRKey.Issuer doc to describe both consumer profiles (embedded authserver's own issuer vs CLI's upstream issuer) and the cache-key non-collision invariant. - MEDIUM pkg/auth/dcr/resolver.go (body): extend queryStrippingPattern / SanitizeErrorForLog to also match redis(s):// URLs; covers the persistent CredentialStore error-chain surface on the embedded-authserver path. - LOW pkg/auth/discovery/discovery.go (3221902485): log the store.Close() error at debug instead of dropping it; rationale in the deferred-Close block. - LOW pkg/authserver/runner/dcr_adapter_test.go (3221902504): add an env-var case to TestNewDCRRequest pinning the InitialAccessTokenEnvVar wiring end-to-end.
Large PR justification has been provided. Thank you!
|
✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review. |
Sub-issue 4b of #5145. The CLI OAuth flow at pkg/auth/discovery::PerformOAuthFlow used to call oauthproto.RegisterClientDynamically directly, so it did not inherit the review-property behaviours added during #5042 (S256 PKCE gating, RFC 7591 §3.2.1 expiry-driven refetch, bearer-token transport with redirect refusal, panic recovery, singleflight deduplication). This commit routes that call site through the shared pkg/auth/dcr resolver introduced in sub-issue 4a (PR #5198) and pins the invariant with a CI grep guard. Profile-neutral resolver input: pkg/auth/dcr now exposes a Request struct that carries exactly the fields the resolver reads (issuer, redirect URI, scopes, discovery URL or registration endpoint, optional explicit endpoint overrides, initial access token, client name, public-client flag). ResolveCredentials takes a Request and no longer imports authserver / upstream domain types. The embedded-authserver adapter helpers (needsDCR, consumeResolution, applyResolutionToOAuth2Config) move to pkg/authserver/runner/dcr_adapter.go where they belong by ownership. CLI persistence model: option (b) from the issue. The resolver runs against an in-memory dcr.CredentialStore scoped to one PerformOAuthFlow invocation. Cross-invocation persistence is handled outside the resolver by pkg/auth/remote/handler.go's existing CachedClientID / CachedClientSecretRef fields, which already preserved cross-invocation reuse and continue to do so unchanged. Wrapping the secretProvider into a CredentialStore adapter (option (a)) was rejected as out-of-scope churn — the existing remote-handler caching is sufficient. PublicClient flag: a new bool on dcr.Request tells the resolver to register as a public PKCE client (token_endpoint_auth_method=none). The S256 gate still fires — the CLI surfaces a clear resolver error rather than silently downgrading when upstream advertises only "plain". Invariant guard: Taskfile target check-dcr-isolation (wired into task lint) and a matching CI step in .github/workflows/lint.yml fail if oauthproto.RegisterClientDynamically is referenced anywhere outside pkg/auth/dcr or pkg/oauthproto. Tests added for the CLI's inherited properties (S256 gating, redirect refusal, singleflight deduplication) in pkg/auth/discovery/dcr_resolver_test.go. The fallback error message for upstreams that omit registration_endpoint is preserved verbatim and pinned by TestHandleDynamicRegistration_MissingRegistrationEndpoint. Closes #5145.
Fixed issues from code review: - MEDIUM: Rewrote TestResolveSecret / TestResolveSecretWithEnvVar doc comments so they no longer point at the deleted dcr-package twin; the runner-side resolveSecret is now described as the single authoritative implementation. - MEDIUM: Documented the redundant discovery fetch in resolveDCRCredentials as an acknowledged trade-off (the S256 PKCE gate needs code_challenge_methods_supported, which AuthServerInfo does not carry). Threading that field through AuthServerInfo is the natural follow-up. - MEDIUM: Rewrote the dcr.Request.Issuer field doc to spell out what each consumer puts there and why the cache key cannot collide between the embedded authserver and CLI consumers. Added a matching call-site comment in resolveDCRCredentials. - MEDIUM: Introduced dcr.CloseableCredentialStore (embeds CredentialStore + io.Closer). NewInMemoryStore now returns it, so the CLI's defer store.Close() is compile-time safe — no more anonymous-interface type-assertion that would silently no-op on a future refactor. - MEDIUM: Reconciled the CLI flow's "expiry refetch inheritance" wording with what option (b) actually delivers. The handleDynamicRegistration doc and the dcr_resolver_test.go file-level comment now spell out that cross-invocation expiry handling lives in the remote handler, not the resolver, and that option (a) would close the loop as a follow-up. - LOW: The unreachable "fall back to RegistrationEndpoint-direct path" comment is replaced by an accurate description of when each branch fires.
Fixed issues from second-iteration review:
- MEDIUM: Made dcr.inMemoryStore.Close() idempotent via sync.Once.
storage.MemoryStorage.Close() closes a channel and is NOT itself
idempotent — the previous wrapper inherited that defect through an
incorrect doc comment ("Safe to call multiple times"). The wrapper now
delivers what the comment claims: a second Close returns the captured
error from the first call rather than panicking on
"close of closed channel". Pinned by TestInMemoryStore_CloseIsIdempotent
and TestInMemoryStore_CloseIsIdempotentUnderRace (8 concurrent callers).
- MEDIUM: Dropped the embedded storageBackedStore from inMemoryStore so
the type holds a single *storage.MemoryStorage handle instead of two
parallel handles (one via embedded interface, one concrete) that
required a manual "keep these two in sync" invariant. Get and Put are
implemented directly on inMemoryStore, delegating to s.mem — three
lines each, structurally guaranteed to share a backend with Close.
Pinned by TestInMemoryStore_PutGetCloseShareBackend,
TestInMemoryStore_PutRejectsNilResolution, and
TestInMemoryStore_GetMissingKeyReturnsMissTuple.
The Taskfile check-dcr-isolation grep guard and its workflow counterpart added extra surface area to enforce an architectural invariant. Removing both per reviewer preference; the resolver boundary remains enforced by code review alone.
Addresses #5250 review comments: - MEDIUM pkg/auth/dcr/resolver.go (3221902433): include PublicClient in storage.DCRKey, flightKeyOf, and the Redis key serialisation so public and confidential registrations cannot share a cache entry under any Issuer/RedirectURI/ScopesHash collision. - MEDIUM pkg/auth/dcr/resolver.go (3221902453): apply validateUpstreamEndpointURL to explicit authorization_endpoint / token_endpoint overrides; propagate the error via dcrStepMetadata. - MEDIUM pkg/authserver/runner/dcr_adapter.go (3221902473): add a tripwire test that pins the two-call invariant — calling only consumeResolution must leave the built upstream.OAuth2Config's ClientSecret empty. - MEDIUM pkg/authserver/storage/types.go (body): rewrite DCRKey.Issuer doc to describe both consumer profiles (embedded authserver's own issuer vs CLI's upstream issuer) and the cache-key non-collision invariant. - MEDIUM pkg/auth/dcr/resolver.go (body): extend queryStrippingPattern / SanitizeErrorForLog to also match redis(s):// URLs; covers the persistent CredentialStore error-chain surface on the embedded-authserver path. - LOW pkg/auth/discovery/discovery.go (3221902485): log the store.Close() error at debug instead of dropping it; rationale in the deferred-Close block. - LOW pkg/authserver/runner/dcr_adapter_test.go (3221902504): add an env-var case to TestNewDCRRequest pinning the InitialAccessTokenEnvVar wiring end-to-end.
c805aa3 to
84e9a60
Compare
The doc comment for TestInMemoryStore_PutRejectsNilResolution used a possessive apostrophe where the plain noun is correct.
Summary
Sub-issue 4b of #5145. The CLI OAuth flow in
pkg/auth/discovery::PerformOAuthFlowused to calloauthproto.RegisterClientDynamicallydirectly, so it bypassed the review-property behaviours added during #5042 (S256 PKCE gating, RFC 7591 §3.2.1 expiry-driven refetch, bearer-token transport with redirect refusal, panic recovery, singleflight deduplication). This PR routes that call site through the sharedpkg/auth/dcrresolver introduced in sub-issue 4a (PR #5198) so both consumers — the embedded authserver and the CLI — share the same hardened DCR path.To make the resolver genuinely profile-agnostic,
pkg/auth/dcrnow exposes aRequeststruct carrying exactly the fields the resolver reads. The embedded-authserver adapter helpers move intopkg/authserver/runner/dcr_adapter.gowhere they belong by ownership, andResolveCredentialsno longer imports authserver or upstream domain types.Closes #5219.
Large PR Justification
The diff is 1903 insertions / 845 deletions across 18 files (872 / 414 excluding tests). Production-code changes that cannot be split:
ResolveCredentials's API is a single atomic step. Replacing*authserver.OAuth2UpstreamRunConfigwith the profile-neutraldcr.Requestenvelope and liftingNeedsDCR/ConsumeResolution/ApplyResolutionToOAuth2Configintopkg/authserver/runner/dcr_adapter.gomust land together — splitting them would leave the resolver importing authserver types between PRs, which is the architectural drift this sub-issue exists to close.pkg/auth/discovery::PerformOAuthFlowonto the shared resolver in the same PR is what proves the neutralisation is correct; a follow-up CLI migration would leave the API change unjustified and unexercised at merge time.Test churn accounts for the remainder.
pkg/auth/dcr/resolver_test.gowas rewritten to the newRequestAPI (the old test file constructedOAuth2UpstreamRunConfigfixtures throughout); the consumer-side tests for the lifted helpers moved topkg/authserver/runner/dcr_adapter_test.goto follow the production code. Newpkg/auth/discovery/dcr_resolver_test.gopins the CLI's inherited review properties (S256 PKCE gating, redirect refusal, singleflight dedup, fallback error verbatim) — these are the load-bearing acceptance criteria for sub-issue 4b and cannot be deferred to a follow-up.Type of change
Test plan
task test)task lint-fix)New tests in
pkg/auth/discovery/dcr_resolver_test.gocover the CLI's inherited properties:plain.(issuer, scopes, redirectURI)produce one/registercall.TestHandleDynamicRegistration_MissingRegistrationEndpointpins the verbatim "configure with--remote-auth-client-id/--remote-auth-client-secret" message.API Compatibility
v1beta1API, OR theapi-break-allowedlabel is applied and the migration guidance is described above.No operator API surface is touched. The
pkg/auth/dcrpackage was added in 4a (#5198) and has no external consumers; theResolveCredentialssignature change is internal.Changes
pkg/auth/dcr/request.goRequestinput type forResolveCredentials.pkg/auth/dcr/resolver.goResolveCredentialsnow takes*dcr.Request; no longer imports authserver / upstream.pkg/auth/dcr/store.goCloseableCredentialStoreinterface;NewInMemoryStorereturns it sodefer Close()is compile-time safe.pkg/auth/discovery/discovery.gohandleDynamicRegistrationbuilds adcr.Requestand callsdcr.ResolveCredentialsinstead ofoauthproto.RegisterClientDynamically. Loopback redirect URI (RFC 8252 §7.3) and CLI-flag fallback error preserved.pkg/auth/discovery/dcr_request.gopkg/auth/dcr.Request.pkg/auth/discovery/dcr_resolver_test.gopkg/authserver/runner/dcr_adapter.goneedsDCR,newDCRRequest,consumeResolution,applyResolutionToOAuth2Config) moved here frompkg/auth/dcr.pkg/authserver/runner/embeddedauthserver.go*dcr.Requestintodcr.ResolveCredentials.pkg/authserver/runner/dcr_store.gopkg/auth/dcr/store.go(4a follow-through).Does this introduce a user-facing change?
No. The CLI OAuth flow's observable behaviour is unchanged on the happy path. The new error paths (S256 gating, redirect refusal) are inherited security improvements that surface as clearer error messages when an upstream provider misbehaves.
Special notes for reviewers
Persistence model — option (b) from the issue. The resolver runs against an in-memory
dcr.CredentialStorescoped to onePerformOAuthFlowinvocation. Cross-invocation persistence is handled outside the resolver bypkg/auth/remote/handler.go's existingCachedClientID/CachedClientSecretReffields, which already preserved cross-invocation reuse and continue to do so unchanged. Option (a) — wrapping the secret provider as aCredentialStoreadapter — was rejected as out-of-scope churn; the trade-off is documented at the wiring site inhandleDynamicRegistration. A consequence worth flagging: cross-invocation expiry refetch lives in the remote handler, not in the resolver, on the CLI path. Closing that loop is a natural follow-up once option (a) is wired.PublicClient flag. A new bool on
dcr.Requesttells the resolver to register as a public PKCE client (token_endpoint_auth_method=none). The S256 gate still fires — the CLI surfaces a clear resolver error rather than silently downgrading when the upstream advertises onlyplainPKCE.Resolver invariant — code review only. The issue's acceptance criterion for a CI grep guard against direct
oauthproto.RegisterClientDynamicallycalls outsidepkg/auth/dcris intentionally out of scope for this PR. Atask check-dcr-isolationTaskfile target and matching workflow step were prototyped and then removed (commit fd66a1f) per reviewer preference. The resolver boundary is enforced by code review alone going forward.