Skip to content

Webhook Middleware Phase 2: Validating webhook middleware#4314

Open
Sanskarzz wants to merge 8 commits intostacklok:mainfrom
Sanskarzz:dynamicwebhook2
Open

Webhook Middleware Phase 2: Validating webhook middleware#4314
Sanskarzz wants to merge 8 commits intostacklok:mainfrom
Sanskarzz:dynamicwebhook2

Conversation

@Sanskarzz
Copy link
Contributor

@Sanskarzz Sanskarzz commented Mar 22, 2026

Overview

This PR implements Phase 2 of the Dynamic Webhook Middleware feature by introducing the Validating Webhook Middleware. Validating webhooks allow ToolHive to call external HTTP services (such as policy engines, bespoke approval workflows, or rate limiters) to strictly evaluate, approve, or deny MCP requests before they reach backend tools.

Fixes #3397

Key Changes

1. pkg/webhook/validating Package

  • Configuration (config.go): Added MiddlewareParams struct supporting a chain of webhook.Config elements. Includes setup validation requiring >0 webhooks to be explicitly declared.
  • Middleware Handler (middleware.go):
    • Implementation of the ToolHive types.Middleware interface factory.
    • Automatically intercepts MCP POST requests (post-parsing).
    • Composes the HTTP evaluation payload, embedding the original raw JSON-RPC MCPRequest, extracting User Principal attributes directly from the auth.Identity context, and recording the request Origin Context (SourceIP, Transport, ServerName).
    • Evaluation Engine: Invokes all configured webhooks sequentially. It eagerly denies the entire request (HTTP 403) providing an optional custom error message as soon as any webhook responds with allowed: false.
    • Failure Policies: Accurately respects FailurePolicyFail (fail-closed, blocks request on network/server errors) and FailurePolicyIgnore (fail-open, logs a warning on exception but continues pipeline).
  • Test Suite (middleware_test.go): Complete parallelized test-suite covering Allowed=true paths, denial paths, both failure policies, connection errors, and safe bypass for non-MCP calls. (Test Coverage sits above 88%).

2. Runner Integration (pkg/runner)

  • middleware.go:
    • Registered validating.CreateMiddleware inside GetSupportedMiddlewareFactories.
    • Added dedicated configuration wiring (addValidatingWebhookMiddleware) securely positioning the validating evaluation block sequentially after mcp-parser but precisely before auditing (telemetry, authz). Thus blocking unverified telemetry pollution or unauthorized execution.
  • config.go:
    • Expanded the global RunConfig exposing the ValidatingWebhooks []webhook.Config slice.

Testing Performed

  • Run go test ./pkg/webhook/validating/... ./pkg/runner/... (All unit tests passing).
  • Run task lint / task lint-fix against the overall project (clean).

Type of change

  • Bug fix
  • New feature
  • Refactoring (no behavior change)
  • Dependency update
  • Documentation
  • Other (describe):

Test plan

  • Unit tests (task test)
  • E2E tests (task test-e2e)
  • Linting (task lint-fix)
  • Manual testing (describe below)

@github-actions github-actions bot added the size/XL Extra large PR: 1000+ lines changed label Mar 22, 2026
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@Sanskarzz Sanskarzz changed the title Dynamicwebhook2 Webhook Middleware Phase 2: Validating webhook middleware Mar 22, 2026
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Mar 22, 2026
@codecov
Copy link

codecov bot commented Mar 22, 2026

Codecov Report

❌ Patch coverage is 78.35052% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.34%. Comparing base (074326e) to head (3e3c182).

Files with missing lines Patch % Lines
pkg/runner/middleware.go 29.41% 11 Missing and 1 partial ⚠️
pkg/webhook/validating/middleware.go 87.67% 5 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4314      +/-   ##
==========================================
+ Coverage   68.45%   69.34%   +0.88%     
==========================================
  Files         479      481       +2     
  Lines       48642    48639       -3     
==========================================
+ Hits        33300    33728     +428     
+ Misses      12373    12317      -56     
+ Partials     2969     2594     -375     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions bot dismissed their stale review March 23, 2026 09:13

PR size has been reduced below the XL threshold. Thank you for splitting this up!

@github-actions
Copy link
Contributor

✅ PR size has been reduced below the XL threshold. The size review has been dismissed and this PR can now proceed with normal review. Thank you for splitting this up!

@github-actions github-actions bot added size/L Large PR: 600-999 lines changed size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed size/L Large PR: 600-999 lines changed labels Mar 23, 2026
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed size/L Large PR: 600-999 lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Mar 23, 2026
@github-actions github-actions bot dismissed their stale review March 23, 2026 19:32

PR size has been reduced below the XL threshold. Thank you for splitting this up!

@github-actions
Copy link
Contributor

✅ PR size has been reduced below the XL threshold. The size review has been dismissed and this PR can now proceed with normal review. Thank you for splitting this up!

@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Mar 23, 2026
@Sanskarzz Sanskarzz marked this pull request as ready for review March 23, 2026 19:51
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Mar 23, 2026
@Sanskarzz Sanskarzz requested a review from jhrozek as a code owner March 23, 2026 21:01
@Sanskarzz Sanskarzz requested a review from yrobla as a code owner March 23, 2026 21:01
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Mar 23, 2026
@Sanskarzz
Copy link
Contributor Author

@JAORMX This PR is ready for review.

@JAORMX
Copy link
Collaborator

JAORMX commented Mar 24, 2026

@Sanskarzz you might need to rebase, there is something off with the PR as it's showing it's 39 commits.

Copy link
Collaborator

@JAORMX JAORMX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work on this @Sanskarzz! The overall structure is clean and follows the existing middleware patterns well. Middleware chain placement (after MCP parser, before telemetry/authz) matches the RFC. The auth credential protection is also done right... using identity.GetPrincipalInfo() instead of the full Identity to keep credentials out of webhook payloads.

A few things to address before merging:

Blockers:

  • Error responses leak internal webhook names and raw errors to clients (lines 136, 145). The details are already in the logs, so client-facing messages should be generic.
  • Missing yaml tags on webhook.Config fields will break YAML-based configuration. Only Timeout got the tag, the rest need it too.

Suggestions:

  • Multi-webhook chain tests are missing. The issue explicitly requires them and they cover the core differentiating logic.
  • The JSON-RPC error response is built by hand (map[string]any) instead of using the jsonrpc2 library like the authz middleware does.
  • Unrelated swagger type change in pkg/auth/remote/config.go should be a separate PR.

Also: As I mentioned in the PR comments, this needs a rebase to clean up the 39 commits.


slog.Error("Validating webhook error caused request denial",
"webhook", whName, "error", err)
sendErrorResponse(w, http.StatusForbidden, "Forbidden", fmt.Sprintf("Webhook %q error: %v", whName, err))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocker: So... this is sending the webhook name and the raw error back to the client. That error can contain internal URLs, DNS names, connection details like dial tcp 10.0.0.5:8443: connection refused. Even though the user is authenticated at this point, we really shouldn't be leaking our internal topology to clients.

The good news is you're already logging the details on lines 134-135, which is the right place for them. The client response should just be generic.

Suggested change
sendErrorResponse(w, http.StatusForbidden, "Forbidden", fmt.Sprintf("Webhook %q error: %v", whName, err))
sendErrorResponse(w, http.StatusForbidden, \"Forbidden\", \"Request denied by policy\")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed internal webhook names and raw errors from the client-facing responses. It now returns a clean JSON-RPC error structure with a generic message ("Request denied by policy").


msg := resp.Message
if msg == "" {
msg = fmt.Sprintf("Webhook %q denied the request", whName)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocker: Same thing here. The default deny message exposes the internal webhook name to the client. The name is already in the log on line 141, so the client doesn't need it.

Suggested change
msg = fmt.Sprintf("Webhook %q denied the request", whName)
msg = \"Request denied by policy\"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"jsonrpc": "2.0",
"id": nil,
"error": map[string]any{
"code": statusCode,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: So, I went down a rabbit hole checking whether using HTTP status codes as JSON-RPC error codes is correct. The JSON-RPC 2.0 spec reserves -32768 to -32000 for pre-defined errors (with -32000 to -32099 for implementation-defined server errors), and says "the remainder of the space is available for application defined errors." So positive integers like 403 are technically in the "application defined" space and not forbidden by the spec.

The MCP spec (2025-11-25) also explicitly allows HTTP 403 with a JSON-RPC error response body with no id, which is what we're doing here.

And the existing authz middleware (pkg/authz/middleware.go:144) already does jsonrpc2.NewError(403, errorMsg), so this is an established pattern in the codebase.

One thing that is worth fixing: you're building the JSON-RPC response by hand with map[string]any instead of using the jsonrpc2 library like the authz middleware does. Using the library would be more consistent and less error-prone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I have used the library "golang.org/x/exp/jsonrpc2"

return r.RemoteAddr
}

func sendErrorResponse(w http.ResponseWriter, statusCode int, _, message string) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: The third parameter is explicitly discarded (_). If you don't need it, just drop it from the signature. Dead parameters are confusing for the next person reading this.

Suggested change
func sendErrorResponse(w http.ResponseWriter, statusCode int, _, message string) {
func sendErrorResponse(w http.ResponseWriter, statusCode int, message string) {

(This means updating the three call sites to drop their third argument too.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

msg = fmt.Sprintf("Webhook %q denied the request", whName)
}

code := resp.Code
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: This lets the webhook service control the HTTP status code we return to the client (any 4xx-5xx). Is that intentional? A misconfigured webhook could return code: 503 and make the proxy look like it's having service issues, or code: 401 and confuse auth flows.

The webhook's job is allow/deny... not to pick our HTTP semantics. Worth considering whether we should just always return 403 here and ignore the webhook's code preference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On denial, we now explicitly return an HTTP 403 status regardless of the webhook's preference, avoiding client-side confusion.

w.Header().Set("Content-Type", "application/json")
w.WriteHeader(statusCode)

// Since we are intercepting an MCP request, we should really be returning a JSON-RPC error.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: This comment reads like you're still deciding what to do ("we should really be returning...", "here we'll follow..."). Since this is a deliberate choice, the comment should state the decision, not the deliberation. Something like:

Return a JSON-RPC 2.0 error so MCP clients can parse the denial. The HTTP status code signals the error at the transport level; the JSON-RPC body carries the detail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Timeout time.Duration `json:"timeout"`
Timeout time.Duration `json:"timeout" yaml:"timeout" swaggertype:"primitive,integer"`
// FailurePolicy determines behavior when the webhook call fails.
FailurePolicy FailurePolicy `json:"failure_policy"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocker: So, Timeout got a yaml tag in this PR (line 69), but the other fields here (Name, URL, FailurePolicy, TLSConfig, HMACSecretRef) are still missing yaml tags. Since RunConfig.ValidatingWebhooks uses yaml:"validating_webhooks" for YAML deserialization, the nested Config fields need explicit yaml tags too.

Go's YAML library uses lowercased struct field names by default (e.g., failurepolicy), not the json tag values. So a YAML config with failure_policy: fail would silently fail to deserialize into FailurePolicy. That's a runtime bug that won't show up until someone actually tries to configure this via YAML.

All the fields need yaml tags matching the json ones. For example:

FailurePolicy FailurePolicy `json:"failure_policy" yaml:"failure_policy"`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the missing explicit yaml tags correctly for the webhook.Config fields so no runtime bugs happen with YAML definitions.

Scopes []string `json:"scopes,omitempty" yaml:"scopes,omitempty"`
SkipBrowser bool `json:"skip_browser,omitempty" yaml:"skip_browser,omitempty"`
Timeout time.Duration `json:"timeout,omitempty" yaml:"timeout,omitempty" swaggertype:"string" example:"5m"`
Timeout time.Duration `json:"timeout,omitempty" yaml:"timeout,omitempty" swaggertype:"primitive,integer"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: This swagger type fix (string -> primitive,integer) is unrelated to the validating webhook feature and changes the OpenAPI spec contract for the auth timeout field. Could you split this into its own PR? Keeps this one focused on the webhook middleware.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted the pkg/auth/remote/config.go swagger tag changes; I'll submit those separately if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created Issue and PR fix for this integration.

Now the current PR is dependent

Copy link
Contributor Author

@Sanskarzz Sanskarzz Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix the CI docs error after merging the PR.

)

//nolint:paralleltest // Shares a mock HTTP server and lastRequest state
func TestValidatingMiddleware(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: The issue (#3397) explicitly calls out testing multiple webhooks in a chain:

"When multiple validating webhooks are configured: Execute in configuration order. If ANY webhook returns allowed: false, deny the request."

The current tests only exercise a single webhook. We need at least:

  • First allows, second denies (verifies sequential evaluation)
  • First denies (verifies short-circuit... the second webhook should never be called)
  • Mixed failure policies (first with fail, second with ignore, or vice versa)

These are important because the sequential loop in createValidatingHandler is the core logic that differentiates this from a single-webhook setup. Without these tests, we can't be confident the chain actually works as specified.

Also a few more branching paths worth covering:

  • When resp.Message is empty (exercises the default message on line 144-145)
  • When the webhook returns an out-of-range code like 0 or 200 (exercises the code < 400 || code > 599 guard on line 149)
  • When auth.IdentityFromContext returns ok=false (verifies Principal is nil in the payload)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Added TestMultiWebhookChain to verify sequential evaluation, short-circuiting when the first webhook denies, and mixed failure policies (e.g., fail-open on connection error).
  • Added subtests for branching paths: out-of-range codes (e.g., 0, 200 defaulting to 403), empty resp.Message, and requests without an authenticated identity.

@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Mar 24, 2026
@Sanskarzz
Copy link
Contributor Author

@Sanskarzz you might need to rebase, there is something off with the PR as it's showing it's 39 commits.

@JAORMX , I have fixed the commit issue. The extra commits were caused by rebasing against my fork's stale main instead of upstream/main. I've since cleaned that up — the PR should now reflect only the relevant Phase 2 commits. I'll address all the review comments shortly.

@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/L Large PR: 600-999 lines changed labels Mar 24, 2026
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Mar 24, 2026
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Mar 24, 2026
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Webhook Middleware Phase 2: Validating webhook middleware

2 participants