Skip to content

Add Envoy as an alternative network proxy backend#5652

Draft
ChrisJBurns wants to merge 3 commits into
mainfrom
cburns/envoy-network-proxy
Draft

Add Envoy as an alternative network proxy backend#5652
ChrisJBurns wants to merge 3 commits into
mainfrom
cburns/envoy-network-proxy

Conversation

@ChrisJBurns

Copy link
Copy Markdown
Collaborator

Summary

Network isolation today starts three auxiliary containers per workload: an egress Squid proxy, an ingress Squid proxy, and a dnsmasq DNS container. The two Squid containers are logically one gateway — splitting them into two processes is an implementation artifact. This PR introduces an Envoy-based backend that consolidates both into a single container, reducing the auxiliary count from 3 to 2.

Selected with TOOLHIVE_NETWORK_PROXY=envoy. Squid remains the default; this is opt-in and experimental.

What changed
  • pkg/container/docker/networkproxy.go — new networkProxy interface with SetupProxies(ctx, proxySpec) (proxyResult, error). Extracts proxy concerns out of deployOps so backends are swappable. newNetworkProxy reads TOOLHIVE_NETWORK_PROXY; fails loudly on unknown values.
  • pkg/container/docker/squid.go — existing Squid logic re-homed behind a squidProxy backend. No behaviour change on the default path.
  • pkg/container/docker/envoy.goenvoyProxy backend. Generates a protobuf-JSON Envoy bootstrap with two listeners (egress forward proxy on :3128, ingress reverse proxy), writes it to a 0600 temp file, and starts a single envoyproxy/envoy-distroless container.
  • pkg/container/docker/client.go — wires networkProxy into Client; calls SetupProxies before createMcpContainer so env vars can be injected. Also fixes two pre-existing copy-before-mutate bugs in addEgressEnvVars and generatePortBindings.
  • docs/arch/14-envoy-network-proxy.md — architecture doc explaining the design, Squid vs Envoy comparison, and known limitations.
Envoy egress filter chain
HTTP_PROXY → Envoy :3128
  HCM (CONNECT upgrade enabled)
  ├── RBAC DENY  — gateway IP (L3 CIDR) + gateway hostnames (L7 :authority prefix)
  ├── RBAC ALLOW — outbound allowlist from permission profile
  ├── dynamic_forward_proxy
  └── router

CONNECT route matcher is present so HTTPS tunnels through correctly. Gateway deny uses both L3 CIDR (destination_ip) for direct-IP connections and L7 :authority prefix matching for hostname-based connections — covering both plain HTTP and HTTPS CONNECT where the authority includes the port (e.g. host.docker.internal:443).

Type of change

  • New feature
  • Refactoring

Test plan

  • task lint-fix passes
  • task test passes for pkg/container/docker/... with -race
  • task build passes
  • Unit tests for networkProxy interface, newNetworkProxy factory, egress RBAC generation, ingress listener, bootstrap file mode, admin loopback, and the mandatory empty-policy deny-all guard
  • Manually tested with TOOLHIVE_NETWORK_PROXY=envoy thv run io.github.stacklok/fetch:
    • External HTTPS (https://example.com) — allowed ✓
    • External HTTP (http://example.com) — allowed ✓
    • Docker bridge gateway IP — denied ✓
    • host.docker.internal — denied ✓
    • gateway.docker.internal — denied ✓

Special notes for reviewers

Squid is unchanged and remains the default. Nothing changes for users who do not set TOOLHIVE_NETWORK_PROXY=envoy.

The Envoy config is hand-rolled protobuf-JSON (typed Go structs). Unit tests validate the Go-level struct shape; Envoy validates the full config at container start. The architecture doc (docs/arch/14-envoy-network-proxy.md) covers known limitations including the tag-pinned image, CONNECT log timing, and the absence of AllowPort translation.

Generated with Claude Code

ChrisJBurns and others added 3 commits June 25, 2026 21:41
Introduces a networkProxy interface as the single enforcement point for
proxy container setup (egress forward proxy, ingress reverse proxy).
The existing Squid logic moves behind a squidProxy backend selected via
TOOLHIVE_NETWORK_PROXY (default: squid). Fails loudly on unknown values.

SetupProxies is now invoked before createMcpContainer so the returned
env vars can be injected into the workload; port extraction for non-stdio
transports is done before the call (stdio short-circuit preserved).

Also fixes two pre-existing copy-before-mutate violations in
addEgressEnvVars and generatePortBindings.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Introduces envoyProxy, which consolidates egress (forward proxy :3128)
and ingress (reverse proxy) into a single container, reducing auxiliary
container count from 3 (Squid: egress + ingress + dns) to 2 (Envoy
combined + dns). Selected via TOOLHIVE_NETWORK_PROXY=envoy; Squid
remains the default.

The gateway block uses two layers: a gateway-deny filter carrying the
resolved GatewayIP (L3) and Docker-internal hostnames (L7), prepended
before the HTTP RBAC allowlist filter. Admin API binds to 127.0.0.1
loopback only. Bootstrap temp files are written at mode 0600.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Rewrites the Envoy bootstrap generation to produce valid protobuf-JSON:
every typed_config field now carries the required @type URL. Adds the
CONNECT route matcher so HTTPS tunnelling works, fixes ingress listener
to bind 0.0.0.0 inside the container (Docker port forwarding targets the
bridge IP, not the container loopback), switches gateway hostname deny to
prefix matching so host.docker.internal:443 is caught in HTTPS CONNECT,
and adds stdout access logging to both listeners.

Also adds docs/arch/14-envoy-network-proxy.md describing the design,
the comparison with Squid, and known limitations.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@github-actions github-actions Bot added the size/XL Extra large PR: 1000+ lines changed label Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant