Webhook Middleware Phase 2: Validating webhook middleware#4314
Webhook Middleware Phase 2: Validating webhook middleware#4314Sanskarzz wants to merge 8 commits intostacklok:mainfrom
Conversation
There was a problem hiding this comment.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
656c138 to
315001a
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4314 +/- ##
==========================================
+ Coverage 68.45% 69.34% +0.88%
==========================================
Files 479 481 +2
Lines 48642 48639 -3
==========================================
+ Hits 33300 33728 +428
+ Misses 12373 12317 -56
+ Partials 2969 2594 -375 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
315001a to
9364118
Compare
PR size has been reduced below the XL threshold. Thank you for splitting this up!
|
✅ PR size has been reduced below the XL threshold. The size review has been dismissed and this PR can now proceed with normal review. Thank you for splitting this up! |
There was a problem hiding this comment.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
PR size has been reduced below the XL threshold. Thank you for splitting this up!
|
✅ PR size has been reduced below the XL threshold. The size review has been dismissed and this PR can now proceed with normal review. Thank you for splitting this up! |
|
@JAORMX This PR is ready for review. |
|
@Sanskarzz you might need to rebase, there is something off with the PR as it's showing it's 39 commits. |
JAORMX
left a comment
There was a problem hiding this comment.
Thanks for the work on this @Sanskarzz! The overall structure is clean and follows the existing middleware patterns well. Middleware chain placement (after MCP parser, before telemetry/authz) matches the RFC. The auth credential protection is also done right... using identity.GetPrincipalInfo() instead of the full Identity to keep credentials out of webhook payloads.
A few things to address before merging:
Blockers:
- Error responses leak internal webhook names and raw errors to clients (lines 136, 145). The details are already in the logs, so client-facing messages should be generic.
- Missing
yamltags onwebhook.Configfields will break YAML-based configuration. OnlyTimeoutgot the tag, the rest need it too.
Suggestions:
- Multi-webhook chain tests are missing. The issue explicitly requires them and they cover the core differentiating logic.
- The JSON-RPC error response is built by hand (
map[string]any) instead of using thejsonrpc2library like the authz middleware does. - Unrelated swagger type change in
pkg/auth/remote/config.goshould be a separate PR.
Also: As I mentioned in the PR comments, this needs a rebase to clean up the 39 commits.
pkg/webhook/validating/middleware.go
Outdated
|
|
||
| slog.Error("Validating webhook error caused request denial", | ||
| "webhook", whName, "error", err) | ||
| sendErrorResponse(w, http.StatusForbidden, "Forbidden", fmt.Sprintf("Webhook %q error: %v", whName, err)) |
There was a problem hiding this comment.
blocker: So... this is sending the webhook name and the raw error back to the client. That error can contain internal URLs, DNS names, connection details like dial tcp 10.0.0.5:8443: connection refused. Even though the user is authenticated at this point, we really shouldn't be leaking our internal topology to clients.
The good news is you're already logging the details on lines 134-135, which is the right place for them. The client response should just be generic.
| sendErrorResponse(w, http.StatusForbidden, "Forbidden", fmt.Sprintf("Webhook %q error: %v", whName, err)) | |
| sendErrorResponse(w, http.StatusForbidden, \"Forbidden\", \"Request denied by policy\") |
There was a problem hiding this comment.
Removed internal webhook names and raw errors from the client-facing responses. It now returns a clean JSON-RPC error structure with a generic message ("Request denied by policy").
pkg/webhook/validating/middleware.go
Outdated
|
|
||
| msg := resp.Message | ||
| if msg == "" { | ||
| msg = fmt.Sprintf("Webhook %q denied the request", whName) |
There was a problem hiding this comment.
blocker: Same thing here. The default deny message exposes the internal webhook name to the client. The name is already in the log on line 141, so the client doesn't need it.
| msg = fmt.Sprintf("Webhook %q denied the request", whName) | |
| msg = \"Request denied by policy\" |
pkg/webhook/validating/middleware.go
Outdated
| "jsonrpc": "2.0", | ||
| "id": nil, | ||
| "error": map[string]any{ | ||
| "code": statusCode, |
There was a problem hiding this comment.
suggestion: So, I went down a rabbit hole checking whether using HTTP status codes as JSON-RPC error codes is correct. The JSON-RPC 2.0 spec reserves -32768 to -32000 for pre-defined errors (with -32000 to -32099 for implementation-defined server errors), and says "the remainder of the space is available for application defined errors." So positive integers like 403 are technically in the "application defined" space and not forbidden by the spec.
The MCP spec (2025-11-25) also explicitly allows HTTP 403 with a JSON-RPC error response body with no id, which is what we're doing here.
And the existing authz middleware (pkg/authz/middleware.go:144) already does jsonrpc2.NewError(403, errorMsg), so this is an established pattern in the codebase.
One thing that is worth fixing: you're building the JSON-RPC response by hand with map[string]any instead of using the jsonrpc2 library like the authz middleware does. Using the library would be more consistent and less error-prone.
There was a problem hiding this comment.
Now I have used the library "golang.org/x/exp/jsonrpc2"
pkg/webhook/validating/middleware.go
Outdated
| return r.RemoteAddr | ||
| } | ||
|
|
||
| func sendErrorResponse(w http.ResponseWriter, statusCode int, _, message string) { |
There was a problem hiding this comment.
nitpick: The third parameter is explicitly discarded (_). If you don't need it, just drop it from the signature. Dead parameters are confusing for the next person reading this.
| func sendErrorResponse(w http.ResponseWriter, statusCode int, _, message string) { | |
| func sendErrorResponse(w http.ResponseWriter, statusCode int, message string) { |
(This means updating the three call sites to drop their third argument too.)
pkg/webhook/validating/middleware.go
Outdated
| msg = fmt.Sprintf("Webhook %q denied the request", whName) | ||
| } | ||
|
|
||
| code := resp.Code |
There was a problem hiding this comment.
question: This lets the webhook service control the HTTP status code we return to the client (any 4xx-5xx). Is that intentional? A misconfigured webhook could return code: 503 and make the proxy look like it's having service issues, or code: 401 and confuse auth flows.
The webhook's job is allow/deny... not to pick our HTTP semantics. Worth considering whether we should just always return 403 here and ignore the webhook's code preference.
There was a problem hiding this comment.
On denial, we now explicitly return an HTTP 403 status regardless of the webhook's preference, avoiding client-side confusion.
pkg/webhook/validating/middleware.go
Outdated
| w.Header().Set("Content-Type", "application/json") | ||
| w.WriteHeader(statusCode) | ||
|
|
||
| // Since we are intercepting an MCP request, we should really be returning a JSON-RPC error. |
There was a problem hiding this comment.
nitpick: This comment reads like you're still deciding what to do ("we should really be returning...", "here we'll follow..."). Since this is a deliberate choice, the comment should state the decision, not the deliberation. Something like:
Return a JSON-RPC 2.0 error so MCP clients can parse the denial. The HTTP status code signals the error at the transport level; the JSON-RPC body carries the detail.
pkg/webhook/types.go
Outdated
| Timeout time.Duration `json:"timeout"` | ||
| Timeout time.Duration `json:"timeout" yaml:"timeout" swaggertype:"primitive,integer"` | ||
| // FailurePolicy determines behavior when the webhook call fails. | ||
| FailurePolicy FailurePolicy `json:"failure_policy"` |
There was a problem hiding this comment.
blocker: So, Timeout got a yaml tag in this PR (line 69), but the other fields here (Name, URL, FailurePolicy, TLSConfig, HMACSecretRef) are still missing yaml tags. Since RunConfig.ValidatingWebhooks uses yaml:"validating_webhooks" for YAML deserialization, the nested Config fields need explicit yaml tags too.
Go's YAML library uses lowercased struct field names by default (e.g., failurepolicy), not the json tag values. So a YAML config with failure_policy: fail would silently fail to deserialize into FailurePolicy. That's a runtime bug that won't show up until someone actually tries to configure this via YAML.
All the fields need yaml tags matching the json ones. For example:
FailurePolicy FailurePolicy `json:"failure_policy" yaml:"failure_policy"`There was a problem hiding this comment.
Added the missing explicit yaml tags correctly for the webhook.Config fields so no runtime bugs happen with YAML definitions.
pkg/auth/remote/config.go
Outdated
| Scopes []string `json:"scopes,omitempty" yaml:"scopes,omitempty"` | ||
| SkipBrowser bool `json:"skip_browser,omitempty" yaml:"skip_browser,omitempty"` | ||
| Timeout time.Duration `json:"timeout,omitempty" yaml:"timeout,omitempty" swaggertype:"string" example:"5m"` | ||
| Timeout time.Duration `json:"timeout,omitempty" yaml:"timeout,omitempty" swaggertype:"primitive,integer"` |
There was a problem hiding this comment.
suggestion: This swagger type fix (string -> primitive,integer) is unrelated to the validating webhook feature and changes the OpenAPI spec contract for the auth timeout field. Could you split this into its own PR? Keeps this one focused on the webhook middleware.
There was a problem hiding this comment.
Reverted the pkg/auth/remote/config.go swagger tag changes; I'll submit those separately if needed.
There was a problem hiding this comment.
I'll fix the CI docs error after merging the PR.
| ) | ||
|
|
||
| //nolint:paralleltest // Shares a mock HTTP server and lastRequest state | ||
| func TestValidatingMiddleware(t *testing.T) { |
There was a problem hiding this comment.
suggestion: The issue (#3397) explicitly calls out testing multiple webhooks in a chain:
"When multiple validating webhooks are configured: Execute in configuration order. If ANY webhook returns
allowed: false, deny the request."
The current tests only exercise a single webhook. We need at least:
- First allows, second denies (verifies sequential evaluation)
- First denies (verifies short-circuit... the second webhook should never be called)
- Mixed failure policies (first with
fail, second withignore, or vice versa)
These are important because the sequential loop in createValidatingHandler is the core logic that differentiates this from a single-webhook setup. Without these tests, we can't be confident the chain actually works as specified.
Also a few more branching paths worth covering:
- When
resp.Messageis empty (exercises the default message on line 144-145) - When the webhook returns an out-of-range code like
0or200(exercises thecode < 400 || code > 599guard on line 149) - When
auth.IdentityFromContextreturnsok=false(verifiesPrincipalis nil in the payload)
There was a problem hiding this comment.
- Added
TestMultiWebhookChainto verify sequential evaluation, short-circuiting when the first webhook denies, and mixed failure policies (e.g., fail-open on connection error). - Added subtests for branching paths: out-of-range codes (e.g.,
0,200defaulting to403), emptyresp.Message, and requests without an authenticated identity.
d7a0b15 to
a1e96df
Compare
@JAORMX , I have fixed the commit issue. The extra commits were caused by rebasing against my fork's stale |
a1e96df to
6959846
Compare
There was a problem hiding this comment.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
9ff976d to
f6a1dee
Compare
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
Overview
This PR implements Phase 2 of the Dynamic Webhook Middleware feature by introducing the Validating Webhook Middleware. Validating webhooks allow ToolHive to call external HTTP services (such as policy engines, bespoke approval workflows, or rate limiters) to strictly evaluate, approve, or deny MCP requests before they reach backend tools.
Fixes #3397
Key Changes
1.
pkg/webhook/validatingPackageconfig.go): AddedMiddlewareParamsstruct supporting a chain ofwebhook.Configelements. Includes setup validation requiring >0 webhooks to be explicitly declared.middleware.go):types.Middlewareinterface factory.MCPRequest, extracting User Principal attributes directly from theauth.Identitycontext, and recording the request Origin Context (SourceIP,Transport,ServerName).allowed: false.FailurePolicyFail(fail-closed, blocks request on network/server errors) andFailurePolicyIgnore(fail-open, logs a warning on exception but continues pipeline).middleware_test.go): Complete parallelized test-suite coveringAllowed=truepaths, denial paths, both failure policies, connection errors, and safe bypass for non-MCP calls. (Test Coverage sits above 88%).2. Runner Integration (
pkg/runner)middleware.go:validating.CreateMiddlewareinsideGetSupportedMiddlewareFactories.addValidatingWebhookMiddleware) securely positioning the validating evaluation block sequentially aftermcp-parserbut precisely before auditing (telemetry,authz). Thus blocking unverified telemetry pollution or unauthorized execution.config.go:RunConfigexposing theValidatingWebhooks []webhook.Configslice.Testing Performed
go test ./pkg/webhook/validating/... ./pkg/runner/...(All unit tests passing).task lint/task lint-fixagainst the overall project (clean).Type of change
Test plan
task test)task test-e2e)task lint-fix)