Add Envoy as an alternative network proxy backend#5652
Draft
ChrisJBurns wants to merge 3 commits into
Draft
Conversation
Introduces a networkProxy interface as the single enforcement point for proxy container setup (egress forward proxy, ingress reverse proxy). The existing Squid logic moves behind a squidProxy backend selected via TOOLHIVE_NETWORK_PROXY (default: squid). Fails loudly on unknown values. SetupProxies is now invoked before createMcpContainer so the returned env vars can be injected into the workload; port extraction for non-stdio transports is done before the call (stdio short-circuit preserved). Also fixes two pre-existing copy-before-mutate violations in addEgressEnvVars and generatePortBindings. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Introduces envoyProxy, which consolidates egress (forward proxy :3128) and ingress (reverse proxy) into a single container, reducing auxiliary container count from 3 (Squid: egress + ingress + dns) to 2 (Envoy combined + dns). Selected via TOOLHIVE_NETWORK_PROXY=envoy; Squid remains the default. The gateway block uses two layers: a gateway-deny filter carrying the resolved GatewayIP (L3) and Docker-internal hostnames (L7), prepended before the HTTP RBAC allowlist filter. Admin API binds to 127.0.0.1 loopback only. Bootstrap temp files are written at mode 0600. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Rewrites the Envoy bootstrap generation to produce valid protobuf-JSON: every typed_config field now carries the required @type URL. Adds the CONNECT route matcher so HTTPS tunnelling works, fixes ingress listener to bind 0.0.0.0 inside the container (Docker port forwarding targets the bridge IP, not the container loopback), switches gateway hostname deny to prefix matching so host.docker.internal:443 is caught in HTTPS CONNECT, and adds stdout access logging to both listeners. Also adds docs/arch/14-envoy-network-proxy.md describing the design, the comparison with Squid, and known limitations. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Network isolation today starts three auxiliary containers per workload: an egress Squid proxy, an ingress Squid proxy, and a dnsmasq DNS container. The two Squid containers are logically one gateway — splitting them into two processes is an implementation artifact. This PR introduces an Envoy-based backend that consolidates both into a single container, reducing the auxiliary count from 3 to 2.
Selected with
TOOLHIVE_NETWORK_PROXY=envoy. Squid remains the default; this is opt-in and experimental.What changed
pkg/container/docker/networkproxy.go— newnetworkProxyinterface withSetupProxies(ctx, proxySpec) (proxyResult, error). Extracts proxy concerns out ofdeployOpsso backends are swappable.newNetworkProxyreadsTOOLHIVE_NETWORK_PROXY; fails loudly on unknown values.pkg/container/docker/squid.go— existing Squid logic re-homed behind asquidProxybackend. No behaviour change on the default path.pkg/container/docker/envoy.go—envoyProxybackend. Generates a protobuf-JSON Envoy bootstrap with two listeners (egress forward proxy on:3128, ingress reverse proxy), writes it to a0600temp file, and starts a singleenvoyproxy/envoy-distrolesscontainer.pkg/container/docker/client.go— wiresnetworkProxyintoClient; callsSetupProxiesbeforecreateMcpContainerso env vars can be injected. Also fixes two pre-existing copy-before-mutate bugs inaddEgressEnvVarsandgeneratePortBindings.docs/arch/14-envoy-network-proxy.md— architecture doc explaining the design, Squid vs Envoy comparison, and known limitations.Envoy egress filter chain
CONNECT route matcher is present so HTTPS tunnels through correctly. Gateway deny uses both L3 CIDR (
destination_ip) for direct-IP connections and L7:authorityprefix matching for hostname-based connections — covering both plain HTTP and HTTPS CONNECT where the authority includes the port (e.g.host.docker.internal:443).Type of change
Test plan
task lint-fixpassestask testpasses forpkg/container/docker/...with-racetask buildpassesnetworkProxyinterface,newNetworkProxyfactory, egress RBAC generation, ingress listener, bootstrap file mode, admin loopback, and the mandatory empty-policy deny-all guardTOOLHIVE_NETWORK_PROXY=envoy thv run io.github.stacklok/fetch:https://example.com) — allowed ✓http://example.com) — allowed ✓host.docker.internal— denied ✓gateway.docker.internal— denied ✓Special notes for reviewers
Squid is unchanged and remains the default. Nothing changes for users who do not set
TOOLHIVE_NETWORK_PROXY=envoy.The Envoy config is hand-rolled protobuf-JSON (typed Go structs). Unit tests validate the Go-level struct shape; Envoy validates the full config at container start. The architecture doc (
docs/arch/14-envoy-network-proxy.md) covers known limitations including the tag-pinned image, CONNECT log timing, and the absence ofAllowPorttranslation.Generated with Claude Code