feat(server): add runtime auth and namespace scoping#214
feat(server): add runtime auth and namespace scoping#214abhinav-galileo wants to merge 9 commits into
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
09cb289 to
19fa65c
Compare
ad586bb to
3a5b7e4
Compare
af54543 to
479ca86
Compare
Add explicit none, api_key, and jwt runtime auth modes, including a generic no-auth provider. Move controls, bindings, policies, agents, and evaluation storage lookups onto principal namespace scoping. Cover auth mode selection and principal namespace isolation with server tests.
…stream The default forward set (X-API-Key, Authorization, Cookie) only covers credential headers Agent Control itself reads. Deployments whose upstream authenticates against a different header name (e.g., a deployer-specific API-key header) had no way to surface that credential through HttpUpstreamAuthProvider — the inbound header reached AC but never crossed the upstream call. Add an extra_forward_headers config field on HttpUpstreamConfig (defaulting to the empty tuple) that operators populate via the new AGENT_CONTROL_AUTH_UPSTREAM_EXTRA_FORWARD_HEADERS env var (comma- separated). The provider's _forward_headers iterates over the union of the default set and the extras, deduplicating case-insensitively so a duplicate name (cross-set or within extras) does not produce two copies on the wire. Tests: - forwards a configured extra header alongside defaults - default forward set unchanged when extras are empty - extras dedupe against defaults case-insensitively - _parse_extra_forward_headers parametric: None / empty / single / multiple / whitespace / empty-entries / case-folded duplicates - configure_auth_from_env threads the parsed tuple onto the provider Lint clean, typecheck clean, full server suite (747) green.
8312b99 to
e75cbb7
Compare
dce333a to
69aaa49
Compare
| ``context_builder`` on the dependency surfaces ``target_type`` / | ||
| ``target_id``, the provider also enforces that they match the token's | ||
| binding — runtime endpoints get the request-target check for free. | ||
| binding - runtime endpoints get the request-target check for free. |
There was a problem hiding this comment.
P1:
Line 58-76:
Target-binding check is conditional on the request supplying a target, not on the token carrying one. Today a runtime token bound to (env, prod) will pass authorization for /api/v1/evaluation if the caller simply omits target_type/target_id from the body — evaluation_context returns {"target_type": None, "target_id": None}, and lines 61/69 short-circuit on the requested_target* is not None guard. EvaluationRequest.target_type is Optional ([models/evaluation.py:46], so Pydantic will not reject this. Net effect: a target-bound token can be used for a namespace-wide eval that ignores the binding.
Suggested fix — invert the check so the token's binding is the source of truth:
if claims.target_type is not None or claims.target_id is not None:
if context is None:
raise ForbiddenError(...)
if context.get("target_type") != claims.target_type:
raise ForbiddenError("target_type does not match")
if context.get("target_id") != claims.target_id:
raise ForbiddenError("target_id does not match")
Add a regression test in test_runtime_token_exchange_endpoint.py next to test_evaluation_rejects_runtime_jwt_for_wrong_target that omits target_type/target_id from the eval body and asserts 403.
| dependencies=[Depends(get_api_key_from_header)], | ||
| ) | ||
|
|
||
| # Evaluator discovery still uses the local credential dependency. |
There was a problem hiding this comment.
Two routers were left on the legacy Depends(require_api_key) gate — evaluator_router (with the new "Evaluator discovery still uses the local credential dependency." comment) and observability_router. That means AGENT_CONTROL_AUTH_MODE=none and AGENT_CONTROL_AUTH_MODE=http_upstream silently do not apply to either route. An operator running in upstream-auth mode will still find /api/v1/evaluators / /api/v1/observability/... requiring an X-API-Key, and a none-mode deployment will return 401 from these even though the rest of the API is open.
Either migrate them in this PR (each is a one-line require_operation swap with a new EVALUATORS_READ / OBSERVABILITY_WRITE Operation member), or land a follow-up issue link in the comment so the deferral is explicit, not implied.
There was a problem hiding this comment.
Same applies to server/endpoints/observability line 45
| ) | ||
|
|
||
|
|
||
| async def _evaluation_context(request: Request) -> dict[str, object]: |
There was a problem hiding this comment.
except Exception swallows every parse failure and returns {}, so a malformed body removes the target check from the runtime authorizer entirely (cf. Comment 1 above). FastAPI's body validator will eventually 422 the request, but the authorizer has already accepted the token by then — useful signal lost from logs, and harder to spot if the auth ordering ever changes.
Suggested: narrow the catch to (json.JSONDecodeError, UnicodeDecodeError), log at debug, and once Comment 1 lands the fail-open hole closes for free. Don't widen _evaluation_context to enforce — keep the policy in LocalJwtVerifyProvider, the context builder should just supply data.
| policy_id = int(policy.json()["policy_id"]) | ||
| attach_to_policy = ns_a.post(f"/api/v1/policies/{policy_id}/controls/{control_id}") | ||
| assert attach_to_policy.status_code == 200, attach_to_policy.text | ||
|
|
There was a problem hiding this comment.
The new test pins create / list / get / delete cross-namespace isolation, but skips the mutating verbs on controls (PATCH /controls/{id}, PUT /controls/{id}/data), the binding by-key paths (PUT /control-bindings/by-key, POST /control-bindings/by-key:delete), and the agent association endpoints (POST /agents/{name}/policies/{policy_id} etc.). Each of those threads principal.namespace_key independently and a future refactor that drops the kwarg on one of them won't be caught.
Suggested: parametrize a single test_namespace_isolation_for_writes over (method, url, payload) tuples covering PATCH /controls, PUT /controls/{id}/data, PUT /control-bindings/by-key, POST /agents/{name}/policies/{policy_id}, and POST /agents/{name}/controls/{control_id} — assert each rejects (404 or empty list) when called from ns-b for an ns-a resource. ~40 lines.
| self, | ||
| control_id: int, | ||
| *, | ||
| namespace_key: str | None = None, |
There was a problem hiding this comment.
The sibling get_active_control_or_404 was tightened to make namespace_key required in this PR ([line 163], but get_control_or_404 still accepts namespace_key: str | None = None. Both internal callers (lines 240, 279) pass it, so making it required is a no-op today but closes the door on a future caller forgetting it.
Suggested: drop the default and make the parameter required, same as get_active_control_or_404.
| timeout = float(os.environ.get(_UPSTREAM_TIMEOUT_ENV, "5.0")) | ||
| token = os.environ.get(_UPSTREAM_TOKEN_ENV) | ||
| token_header = os.environ.get(_UPSTREAM_TOKEN_HEADER_ENV, "X-Agent-Control-Service-Token") | ||
| extra_forward_headers = _parse_extra_forward_headers( |
There was a problem hiding this comment.
_build_default_provider accepts header as an alias for api_key (good, backwards-compatible), but the RuntimeError lists only 'none', 'api_key', or 'http_upstream' — operators reading the error after a typo will think header is invalid. Same nit for _resolve_runtime_mode. Add 'header' to the error string, or drop the alias from the accepted set if you intend to retire it.
Summary
none,api_key, andjwt.Stack
Testing
make prepushon the stacked branch in feat(sdk): add runtime token auth #215.