admin: P1 foundation (auth, router, cluster, listener) — no writes yet#623
admin: P1 foundation (auth, router, cluster, listener) — no writes yet#623
Conversation
Lays down the read-only slice of the admin dashboard per
docs/design/2026_04_24_proposed_admin_dashboard.md. No write endpoints
yet — the 3.3.2 acceptance criteria still block those, and they ship
together with AdminForward in a follow-up.
internal/admin:
- Config validation: hard startup failure on missing signing key,
non-loopback without TLS, duplicate role assignments, wrong-length
HS256 keys.
- JWT signer/verifier: HS256, 1h TTL, accepts primary + previous key
for rotation, rejects expired / future-dated / unknown-role tokens.
- Router with strict prefix ordering so /admin/api/v1/* and
/admin/healthz are never shadowed by the SPA fallback.
- Middleware: 64 KiB BodyLimit + MaxBytesError translation,
SessionAuth, RequireWriteRole, CSRF double-submit, slog-backed
admin_audit logger for write paths.
- Login / logout: Content-Type and per-IP 5/min rate limit,
constant-time credential comparison, HttpOnly+Secure+SameSite=Strict
session cookie + separate SPA-readable CSRF cookie on success,
explicit 403 when a valid access key is not in the admin role table.
- Cluster + healthz handlers behind a small ClusterInfoSource
interface so the handler stays tested without main.go types.
- Server ties everything together; NewServer returns a hard error
on missing dependencies so misuse is caught at startup.
main_admin.go + flags:
- New admin-* CLI flags mirroring the Config surface.
- startAdminServer validates config, builds the signer/verifier,
loads the existing S3 credential map as the admin credential
store, and attaches the server lifecycle to errgroup with a clean
shutdown path.
- ClusterInfoSource reads leader/state from raftGroupRuntime.
Tests: happy paths + every failure branch across config, JWT,
router, middleware, auth, cluster, and an in-process main_admin
integration test that boots the real listener over plaintext and
TLS. golangci-lint clean. go test ./... green.
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 39 minutes and 24 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (21)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a new admin HTTP service for the application, including authentication via JWTs, role-based access control, rate limiting, and an audit logging middleware. The implementation includes a router for managing API endpoints, static assets, and an SPA fallback. My review identified several security and reliability concerns: the cluster handler exposes raw internal errors, the credential comparison logic is vulnerable to timing attacks, the audit middleware fails to capture actors for login/logout events, the secret key trimming may cause authentication inconsistencies, and the rate limiter's cleanup logic does not sufficiently prevent potential memory exhaustion from unbounded map growth.
| } | ||
| info, err := h.source.Describe(r.Context()) | ||
| if err != nil { | ||
| writeJSONError(w, http.StatusInternalServerError, "cluster_describe_failed", err.Error()) |
There was a problem hiding this comment.
| dummy := strings.Repeat("x", dummySecretLen) | ||
| _ = subtle.ConstantTimeCompare([]byte(req.SecretKey), []byte(dummy)) | ||
| writeJSONError(w, http.StatusUnauthorized, "invalid_credentials", | ||
| "access_key or secret_key is invalid") | ||
| return AuthPrincipal{}, false | ||
| } | ||
| if subtle.ConstantTimeCompare([]byte(req.SecretKey), []byte(expected)) != 1 { | ||
| writeJSONError(w, http.StatusUnauthorized, "invalid_credentials", | ||
| "access_key or secret_key is invalid") | ||
| return AuthPrincipal{}, false | ||
| } |
There was a problem hiding this comment.
The credential comparison logic is susceptible to a timing attack that leaks the length of the secret key. subtle.ConstantTimeCompare returns 0 immediately if the lengths of the input slices differ, allowing an attacker to determine the length of the expected secret by measuring response times. To mitigate this, you should hash both the provided secret and the expected (or dummy) secret using a fixed-length cryptographic hash (e.g., SHA-256) before performing the constant-time comparison.
| principal, _ := PrincipalFromContext(r.Context()) | ||
| logger.LogAttrs(r.Context(), slog.LevelInfo, "admin_audit", | ||
| slog.String("actor", principal.AccessKey), | ||
| slog.String("role", string(principal.Role)), | ||
| slog.String("method", r.Method), | ||
| slog.String("path", r.URL.Path), | ||
| slog.Int("status", rec.status), | ||
| slog.String("remote", r.RemoteAddr), | ||
| slog.Duration("duration", time.Since(start)), | ||
| ) |
There was a problem hiding this comment.
The Audit middleware does not capture the actor (AccessKey) for login and logout requests because these endpoints do not use the SessionAuth middleware to populate the principal in the request context. Consequently, the audit logs for these critical security events will have an empty actor field. Consider updating the login handler to set the principal in the context upon successful authentication (or even for failed attempts) so the middleware can record it, or implement specific auditing within the authentication handlers.
| req.AccessKey = strings.TrimSpace(req.AccessKey) | ||
| req.SecretKey = strings.TrimSpace(req.SecretKey) |
There was a problem hiding this comment.
Trimming whitespace from the SecretKey is generally discouraged as secrets should be treated as opaque credentials. If the underlying S3 or DynamoDB adapters do not perform this trimming, it creates an inconsistency where a credential might be accepted by the admin service but rejected by other protocol adapters (or vice versa), leading to potential security discrepancies and difficult-to-debug authentication issues.
| if len(rl.entries) > rateLimiterMaxEntries { | ||
| for k, v := range rl.entries { | ||
| if now.Sub(v.windowStart) > rl.window { | ||
| delete(rl.entries, k) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The rate limiter's cleanup logic only removes expired entries and does not enforce a hard limit on the total number of entries in the entries map. An attacker using a large number of distinct IP addresses could cause the map to grow indefinitely, potentially leading to memory exhaustion (OOM). Following the general rules for this repository, you should apply a fixed bound to collections that can grow from external requests and reject new requests (or drop the oldest entries) when the bound is reached.
References
- To prevent unbounded memory growth and potential OOM issues, apply a fixed bound to collections that can grow from external requests, such as pending configuration changes. Reject new requests when the bound is reached.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 535d55c2db
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| staticCreds, err := loadS3StaticCredentials(*s3CredsFile) | ||
| if err != nil { | ||
| return errors.Wrapf(err, "load static credentials for admin listener") |
There was a problem hiding this comment.
Skip credential-file loading when admin is disabled
startAdminFromFlags always parses --s3CredentialsFile before it knows whether the admin listener is enabled. In run(), this function is called unconditionally, so a bad or missing credentials file now aborts process startup even when --adminEnabled=false (and even if S3 is disabled). This turns a disabled feature into a hard dependency and can break existing deployments that never intended to use admin auth.
Useful? React with 👍 / 👎.
| } | ||
| info, err := h.source.Describe(r.Context()) | ||
| if err != nil { | ||
| writeJSONError(w, http.StatusInternalServerError, "cluster_describe_failed", err.Error()) |
There was a problem hiding this comment.
Avoid exposing backend error details in cluster responses
On Describe failures, the handler returns err.Error() directly to clients. That leaks internal implementation details (for example storage, raft, or node-local error text) through a public admin API response, and it contradicts the surrounding contract comment that says only an error code should be exposed. Return a generic message here and keep the detailed error in server logs.
Useful? React with 👍 / 👎.
Addresses the 7 findings on PR #623 from gemini-code-assist and chatgpt-codex: - cluster_handler: stop leaking the raft/store error text to clients; log the full error via slog and return a generic "cluster_describe_failed" body. Test updated to assert the sentinel string does not appear in the response. - auth_handler credential compare: hash both sides with SHA-256 before subtle.ConstantTimeCompare so length-based timing side channels cannot leak the secret size. Keep the unknown-access-key timing parity with a fixed placeholder digest. - auth_handler input handling: stop trimming SecretKey. Trimming was accepting inputs the SigV4 data-plane adapter would reject, creating a cross-protocol inconsistency; AccessKey (identifier) still trims. - auth_handler audit: login/logout now emit their own admin_audit slog entries. The generic Audit middleware cannot see the claimed actor at that point in the chain, so the Audit wrapper is dropped from publicAuth and replaced with explicit logging that captures actor, claimed_actor, remote, and status. Logout best-effort decodes the session cookie via the Verifier so the audit line can name the caller even when the session has not yet been revoked. - ratelimit: enforce a hard cap on distinct tracked IPs. Once the cap is reached we sweep expired windows first; if that still leaves the map full we refuse new entries instead of silently evicting a legitimate user in-progress state. Adds dedicated tests for the cap and window-reclamation paths. - main_admin: do not load --s3CredentialsFile when admin.enabled is false. A bad credentials file should not abort process startup when the admin listener is disabled. Tests: new auth_audit_test.go covers actor recording on login success / failure and logout with and without a session cookie, plus a same-length/different-length timing smoke check. golangci-lint ./... --new-from-rev=origin/main --timeout=5m: 0 issues. go test -race ./internal/admin/... . green.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5006462384
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| tlsConfigured := strings.TrimSpace(c.TLSCertFile) != "" && strings.TrimSpace(c.TLSKeyFile) != "" | ||
| if tlsConfigured || !addressRequiresTLS(strings.TrimSpace(c.Listen)) || c.AllowPlaintextNonLoopback { | ||
| return nil |
There was a problem hiding this comment.
Reject partial admin TLS configuration
validateTLS only treats TLS as configured when both cert and key are non-empty, but it does not fail when exactly one is provided. In loopback mode (or with allow_plaintext_non_loopback=true), this path validates successfully and startup continues in plaintext, so a typo in either TLS flag silently downgrades transport security instead of failing fast. This is especially risky because operators may believe TLS is enabled when it is not.
Useful? React with 👍 / 👎.
- auth_handler secret compare: replace naked SHA-256 with HMAC-SHA256 keyed by a per-process random value. The construction is still only used to normalise inputs to a fixed-width constant-time comparator; using a keyed MAC satisfies CodeQL static analysis and communicates the intent more precisely (timing-safe comparator, not persisted password hash). Deliberately NOT bcrypt / argon2: nothing is persisted, the secret arrives in plaintext over TLS, and the rate limiter already bounds online guessing. - config.validateTLS: fail startup when admin.tls.cert_file and admin.tls.key_file are not both set together. A typo that leaves one empty would previously treat TLS as disabled silently; we now surface the misconfiguration at startup. Added unit test covering the two partial-config cases.
|
@claude review |
There was a problem hiding this comment.
Pull request overview
Adds the initial (read-only) admin listener foundation to the elastickv main binary, including config validation, JWT session auth w/ key rotation, strict router + middleware chain, login/logout with rate limiting, and basic cluster/health endpoints—plus unit and in-process integration tests.
Changes:
- Wire an optional admin HTTP listener into
main.govia new flags and lifecycle management. - Introduce
internal/admin/server composition (router, middleware, auth, JWT, cluster + health endpoints). - Add unit tests across admin subsystems plus in-process listener tests (plaintext + TLS).
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| main.go | Adds admin-related flags and starts admin listener during run(). |
| main_admin.go | Implements flag→config translation, config checks, admin server construction, and listener lifecycle wiring. |
| main_admin_test.go | Adds integration-style tests for starting the admin listener and serving /admin/healthz (HTTP + HTTPS). |
| internal/admin/server.go | Composes admin server dependencies and builds the API handler mux + router. |
| internal/admin/server_test.go | End-to-end handler tests for login/session + cluster access + routing behavior. |
| internal/admin/router.go | Implements strict-prefix admin router for API/health/assets/SPA fallback. |
| internal/admin/router_test.go | Tests router dispatch order and SPA/static behavior. |
| internal/admin/middleware.go | Adds body-limit, session auth, role gate, CSRF, and audit middleware. |
| internal/admin/middleware_test.go | Tests middleware behavior (limit, auth context, CSRF, audit, timing-safe compare). |
| internal/admin/auth_handler.go | Implements login/logout endpoints, cookie issuance, rate limiting, and audit logging. |
| internal/admin/auth_handler_test.go | Tests auth handler (happy paths, cookie hardening, rate limits, logout). |
| internal/admin/auth_audit_test.go | Validates login/logout audit logging behavior and claimed-actor logging. |
| internal/admin/jwt.go | Implements HS256 JWT signing/verifying with key rotation support. |
| internal/admin/jwt_test.go | Tests JWT signing/verifying, rotation acceptance, and malformed cases. |
| internal/admin/config.go | Adds admin listener config, validation, signing key decoding, and role indexing. |
| internal/admin/config_test.go | Tests config validation (TLS requirements, key decoding/length, role overlaps). |
| internal/admin/cluster_handler.go | Adds /admin/api/v1/cluster handler with safe error surfacing and logging. |
| internal/admin/cluster_handler_test.go | Tests cluster handler responses and error sanitization. |
| internal/admin/ratelimit.go | Implements fixed-window per-IP rate limiter with hard cap on distinct IPs. |
| internal/admin/ratelimit_test.go | Tests rate limiter cap behavior and expiry sweeping. |
| internal/admin/principal.go | Defines Role and AuthPrincipal for authorization context. |
| if !s.limiter.allow(clientIP(r)) { | ||
| w.Header().Set("Retry-After", "60") | ||
| writeJSONError(w, http.StatusTooManyRequests, "rate_limited", | ||
| "too many login attempts from this source; try again later") | ||
| return false |
There was a problem hiding this comment.
Retry-After is hard-coded to 60, but the limiter window is configurable via AuthServiceOpts.LoginWindow. If the window is changed (even in tests), the header will be incorrect; consider deriving Retry-After from the configured window duration (seconds, clamped to at least 1).
| expires := time.Now().UTC().Add(s.sessionTTL) | ||
| http.SetCookie(w, s.buildCookie(sessionCookieName, token, true)) | ||
| http.SetCookie(w, s.buildCookie(csrfCookieName, csrf, false)) | ||
| w.Header().Set("Cache-Control", "no-store") | ||
| w.Header().Set("Content-Type", "application/json; charset=utf-8") | ||
| w.WriteHeader(http.StatusOK) | ||
| _ = json.NewEncoder(w).Encode(loginResponse{Role: principal.Role, ExpiresAt: expires}) | ||
| } |
There was a problem hiding this comment.
issueSession uses time.Now() to compute expires_at, while the JWT exp is based on the signer's injected clock. This can make the response’s expires_at disagree with the actual token expiry when a custom clock is used (and is generally inconsistent with the injected time abstraction). Consider using the same clock source used for signing (or plumb the AuthService clock through) when computing expires_at.
| return errors.WithStack(errors.Newf( | ||
| "admin.listen %q is not loopback but TLS is not configured;"+ | ||
| " set admin.tls.cert_file + admin.tls.key_file, or explicitly pass"+ | ||
| " --admin-allow-plaintext-non-loopback (strongly discouraged)", |
There was a problem hiding this comment.
The TLS validation error suggests passing --admin-allow-plaintext-non-loopback, but the actual flag wired in main.go is -adminAllowPlaintextNonLoopback. This mismatch makes the startup error harder to act on; update the message to reference the real flag name (or avoid flag names entirely and point at the config field).
| " --admin-allow-plaintext-non-loopback (strongly discouraged)", | |
| " -adminAllowPlaintextNonLoopback (strongly discouraged)", |
| // BodyLimit wraps the request body with http.MaxBytesReader and responds | ||
| // 413 when the client exceeds the cap. It also sets | ||
| // http.MaxBytesError-aware error translation so the handler does not need | ||
| // to distinguish ordinary IO failures from overflow. | ||
| func BodyLimit(limit int64) func(http.Handler) http.Handler { | ||
| if limit <= 0 { | ||
| limit = defaultBodyLimit | ||
| } | ||
| return func(next http.Handler) http.Handler { | ||
| return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | ||
| if r.Body != nil { | ||
| r.Body = http.MaxBytesReader(w, r.Body, limit) | ||
| } | ||
| next.ServeHTTP(bodyLimitResponseWriter{ResponseWriter: w}, r) | ||
| }) | ||
| } | ||
| } | ||
|
|
||
| // bodyLimitResponseWriter is a minor adapter that lets a handler translate | ||
| // its own MaxBytesError into a consistent 413 without duplicating the | ||
| // plumbing. At the time of writing, each write handler can call | ||
| // r.ParseForm / json.Decode and on error call | ||
| // `if errors.As(err, &http.MaxBytesError{}) { ... }` manually; this | ||
| // wrapper just forces the header once per request. | ||
| type bodyLimitResponseWriter struct { | ||
| http.ResponseWriter | ||
| } | ||
|
|
||
| // WriteMaxBytesError is called by handlers that detected a MaxBytesError. | ||
| // It is a package-level helper rather than a method so the router error | ||
| // path keeps the same JSON shape as the rest. | ||
| func WriteMaxBytesError(w http.ResponseWriter) { | ||
| writeJSONError(w, http.StatusRequestEntityTooLarge, "payload_too_large", | ||
| "request body exceeds the 64 KiB admin limit") | ||
| } |
There was a problem hiding this comment.
BodyLimit claims it "responds 413" / provides "error translation", but it currently only wraps the body with http.MaxBytesReader and leaves translation to individual handlers. Also, bodyLimitResponseWriter is just a pass-through wrapper with no behavior, so it doesn’t achieve the described goal. Either implement centralized translation (e.g., by intercepting http.MaxBytesError handling) or simplify the comments/types to match the current behavior.
| // Constant-time comparison: the values are user-supplied | ||
| // and we do not want to leak length differences. | ||
| if !constantTimeEq(cookie.Value, header) { | ||
| writeJSONError(w, http.StatusForbidden, "csrf_mismatch", "CSRF token mismatch") | ||
| return | ||
| } | ||
| next.ServeHTTP(w, r) | ||
| }) | ||
| } | ||
| } | ||
|
|
||
| func constantTimeEq(a, b string) bool { | ||
| if len(a) != len(b) { | ||
| return false | ||
| } | ||
| var diff byte | ||
| for i := 0; i < len(a); i++ { | ||
| diff |= a[i] ^ b[i] | ||
| } | ||
| return diff == 0 |
There was a problem hiding this comment.
The CSRF middleware comment says the comparison is constant-time and "does not leak length differences", but constantTimeEq returns early when lengths differ. Either adjust the comment (since token length isn’t sensitive) or make the comparison truly constant-time across lengths (e.g., compare fixed-length MACs or use subtle.ConstantTimeCompare on padded buffers).
- auth_handler: derive Retry-After from AuthService.loginWindow instead of hard-coding 60 seconds, so tests and deployments that tune LoginWindow get an accurate hint; clamp to >= 1. - auth_handler: use the signer's injected clock for the login response's expires_at so it cannot drift from the JWT exp claim. AuthService now holds the same clock passed to the signer. - config.validateTLS: the error message pointed at --admin-allow-plaintext-non-loopback, but the real flag is -adminAllowPlaintextNonLoopback. Fix to match the wired flag name so operators can act on the startup error. - middleware.BodyLimit: drop the pass-through bodyLimitResponseWriter wrapper (it provided no translation) and rewrite the comment to say plainly that handlers are responsible for detecting overflow via IsMaxBytesError and calling WriteMaxBytesError. Centralising that in the middleware would either double-write or mask downstream errors depending on the decoder shape. - middleware.CSRFDoubleSubmit: replace constantTimeEq with an explicit length check + subtle.ConstantTimeCompare on bytes, and update the comment to match the actual behaviour. The short-circuit on length mismatch is fine here because both tokens are server-minted at a fixed 32-byte width, so length divergence cannot leak secret state.
|
@claude review |
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
| switch { | ||
| case strings.HasPrefix(p, pathPrefixAPIv1): | ||
| if rt.api == nil { | ||
| rt.notFind.ServeHTTP(w, r) | ||
| return | ||
| } | ||
| rt.api.ServeHTTP(w, r) | ||
| return | ||
| case p == pathHealthz: | ||
| rt.serveHealth(w, r) | ||
| return | ||
| case strings.HasPrefix(p, pathPrefixAssets): | ||
| rt.serveAsset(w, r) | ||
| return | ||
| case p == pathPrefixAdmin || strings.HasPrefix(p, pathPrefixAdmin+"/"): | ||
| rt.serveSPA(w, r) | ||
| return |
There was a problem hiding this comment.
pathPrefixAPIv1 and pathPrefixAssets include a trailing slash and routing uses strings.HasPrefix. Requests to /admin/api/v1 (no trailing slash) or /admin/assets will fall through to the SPA branch and can incorrectly serve index.html instead of returning a JSON 404. Consider explicitly handling the no-trailing-slash variants (or define prefixes without the trailing slash and check segment boundaries) so API/asset roots never resolve to the SPA fallback.
| return admin.ClusterInfoFunc(func(_ context.Context) (admin.ClusterInfo, error) { | ||
| groups := make([]admin.GroupInfo, 0, len(runtimes)) | ||
| for _, rt := range runtimes { | ||
| if rt == nil || rt.engine == nil { | ||
| continue | ||
| } | ||
| status := rt.engine.Status() | ||
| groups = append(groups, admin.GroupInfo{ | ||
| GroupID: rt.spec.id, | ||
| LeaderID: status.Leader.ID, | ||
| IsLeader: strings.EqualFold(string(status.State), "leader"), |
There was a problem hiding this comment.
newClusterInfoSource currently omits GroupInfo.Members entirely and also discards the provided context (func(_ context.Context)). Since admin.GroupInfo includes a Members field (and tests exercise it), the production /admin/api/v1/cluster response will always have empty/null membership even though the raft engine exposes configuration via Engine.Configuration(ctx). Consider using the passed ctx and populating Members from rt.engine.Configuration(ctx) (best-effort, e.g. leave Members empty if config fetch fails).
| return admin.ClusterInfoFunc(func(_ context.Context) (admin.ClusterInfo, error) { | |
| groups := make([]admin.GroupInfo, 0, len(runtimes)) | |
| for _, rt := range runtimes { | |
| if rt == nil || rt.engine == nil { | |
| continue | |
| } | |
| status := rt.engine.Status() | |
| groups = append(groups, admin.GroupInfo{ | |
| GroupID: rt.spec.id, | |
| LeaderID: status.Leader.ID, | |
| IsLeader: strings.EqualFold(string(status.State), "leader"), | |
| return admin.ClusterInfoFunc(func(ctx context.Context) (admin.ClusterInfo, error) { | |
| groups := make([]admin.GroupInfo, 0, len(runtimes)) | |
| for _, rt := range runtimes { | |
| if rt == nil || rt.engine == nil { | |
| continue | |
| } | |
| status := rt.engine.Status() | |
| var members []string | |
| if cfg, err := rt.engine.Configuration(ctx); err == nil { | |
| members = cfg.Members | |
| } | |
| groups = append(groups, admin.GroupInfo{ | |
| GroupID: rt.spec.id, | |
| LeaderID: status.Leader.ID, | |
| IsLeader: strings.EqualFold(string(status.State), "leader"), | |
| Members: members, |
| func (c *Config) validateTLS() error { | ||
| certSet := strings.TrimSpace(c.TLSCertFile) != "" | ||
| keySet := strings.TrimSpace(c.TLSKeyFile) != "" | ||
| if certSet != keySet { | ||
| // A lone cert or key almost always means a typo. Silently | ||
| // treating it as "TLS off" would downgrade transport | ||
| // security while the operator thinks TLS is enabled; fail | ||
| // fast so the misconfiguration is visible at startup. | ||
| return errors.New("admin.tls.cert_file and admin.tls.key_file must be set together;" + | ||
| " partial TLS configuration is not allowed") | ||
| } | ||
| tlsConfigured := certSet && keySet | ||
| if tlsConfigured || !addressRequiresTLS(strings.TrimSpace(c.Listen)) || c.AllowPlaintextNonLoopback { | ||
| return nil | ||
| } | ||
| return errors.WithStack(errors.Newf( | ||
| "admin.listen %q is not loopback but TLS is not configured;"+ | ||
| " set admin.tls.cert_file + admin.tls.key_file, or explicitly pass"+ | ||
| " -adminAllowPlaintextNonLoopback (strongly discouraged)", | ||
| c.Listen, | ||
| )) |
There was a problem hiding this comment.
Several validation errors refer to dotted config keys like admin.tls.cert_file / admin.session_signing_key, but the user-facing configuration surface in this repo is CLI flags (e.g. -adminTLSCertFile, -adminSessionSigningKey). Mixing these can make startup failures harder to action. Consider updating the messages to mention the actual flag names (or include both forms) consistently, similar to how the TLS error already references -adminAllowPlaintextNonLoopback.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 08c9846381
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| p := r.URL.Path | ||
|
|
||
| switch { | ||
| case strings.HasPrefix(p, pathPrefixAPIv1): |
There was a problem hiding this comment.
Treat /admin/api/v1 as API namespace
The API dispatch only matches strings.HasPrefix(p, "/admin/api/v1/"), so GET /admin/api/v1 misses the API mux and falls through to the /admin/* SPA path. When static is configured, that request returns index.html (200) instead of a JSON 404, which violates the router’s strict-order contract that API paths should not be served by SPA fallback and can break clients/probes that omit the trailing slash.
Useful? React with 👍 / 👎.
|
@claude review |
- router: handle bare /admin/api/v1 and /admin/assets (no trailing slash) explicitly so they return a JSON 404 instead of falling through to the SPA fallback and being answered with index.html. New test fixes the behaviour. - main_admin.newClusterInfoSource: populate GroupInfo.Members from rt.engine.Configuration(ctx) and stop discarding the context. Failed Configuration fetches leave Members empty (best effort) so a transient raft state cannot poison the whole cluster snapshot. - config: replace dotted config-key names (admin.listen, admin.tls.cert_file, admin.session_signing_key, etc.) in every validation error message with the actual CLI flag names (-adminListen, -adminTLSCertFile, -adminSessionSigningKey, ...). Tests updated to match.
Summary
First PR toward the admin dashboard designed in
docs/design/2026_04_24_proposed_admin_dashboard.md(merged as #611). Introduces the read-only foundation: listener wiring, auth, router, middleware, cluster info / healthz endpoints. No write endpoints are included — per the design doc P1 DoD they ship together withAdminForwardand the 3.3.2 acceptance criteria in a follow-up.What is in scope
internal/admin/package: config validation, JWT (HS256 + 2-key rotation), strict-prefix router, middleware chain (body limit, session auth, role gate, CSRF double-submit, audit slog), login/logout with per-IP rate limiter, cluster + healthz handlers,Serverfacade.main_admin.go: flag wiring, config-to-admin.Configtranslation, TLS and loopback enforcement, errgroup lifecycle registration.What is NOT in scope (deferred to follow-up PRs)
AdminForwardinternal gRPC RPC and follower→leader forwarding (Section 3.3.2 acceptance criteria 1–6).AuthPrincipal(DynamoDBCreateTable/DeleteTable, S3CreateBucket/DeleteBucket/PutBucketAcl).go:embed(design P3).This keeps the first PR focused and reviewable; the DoD remains respected because no write endpoint ships without the acceptance criteria being green.
Security posture
-adminEnabled=false), default bind is loopback.HttpOnly+Secure+SameSite=Strict+Path=/admin+Max-Age=3600.localStorageis never used.read_only_access_keysandfull_access_keysis a hard startup failure (no silent last-writer-wins).http.MaxBytesReader.admin_auditslog attributes.Test plan
go test -race ./internal/admin/... .— greengolangci-lint run ./... --timeout=5m— 0 issues/admin/healthzreturnsok/admin/healthzover HTTPS returnsok429+Retry-After: 60; different IPs are independentRelated