Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
210 changes: 210 additions & 0 deletions docs/arch/14-envoy-network-proxy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,210 @@
# Envoy Network Proxy

## Status

Experimental — selected with `TOOLHIVE_NETWORK_PROXY=envoy`. Squid remains the
default.

## Problem

When network isolation is enabled (`--isolate-network`), ToolHive currently
starts **three** auxiliary containers per workload:

| Container | Role |
|-----------|------|
| `<name>-egress` | Squid forward proxy — routes outbound traffic through an allowlist |
| `<name>-ingress` | Squid reverse proxy — receives traffic from the proxy runner |
| `<name>-dns` | dnsmasq — provides DNS to the internal network |

Three containers means three image pulls, three startup sequences, three sets of
resources, and three things that can fail or restart. The Squid egress and ingress
containers are logically a single gateway — splitting them into two processes is
an implementation artifact rather than a deliberate design.

## Solution

Replace the two Squid containers with a **single Envoy container** that handles
both egress and ingress as separate listeners inside the same process. The DNS
container (dnsmasq) is unchanged.

```
Before: <name>-egress (Squid) + <name>-ingress (Squid) + <name>-dns
After: <name>-egress (Envoy, two listeners) + <name>-dns
```

This reduces auxiliary container count from 3 → 2, simplifies the startup
sequence, and uses a single bootstrap configuration file to describe the entire
proxy behaviour.

## Why Envoy

### Consolidation

Envoy's `HttpConnectionManager` supports multiple listeners in a single process.
The egress forward proxy (`:3128`) and ingress reverse proxy share the same Envoy
instance, the same access logs, and the same lifecycle.

### L3 + L7 enforcement

Squid operates at L7 only — it can match destination hostnames via `dstdomain`
ACLs but cannot match by IP address in a reliable, port-independent way.

Envoy's RBAC filter supports:
- **`destination_ip`** — CIDR match at L3/L4, applied before the request is
parsed as HTTP. This catches direct-IP connections that bypass DNS.
- **Header match on `:authority`** — L7 match on the CONNECT target or HTTP
Host header, equivalent to Squid's `dstdomain`.

ToolHive combines both layers: outbound traffic is blocked at L3 for known IP
ranges and at L7 for hostname patterns. The `Internal: true` Docker network
remains the fail-closed backstop for non-cooperative traffic that ignores the
proxy entirely.

### Proper dynamic forward proxy

Envoy's `dynamic_forward_proxy` cluster performs per-request DNS resolution and
handles HTTP CONNECT tunnelling natively. HTTPS flows through a CONNECT tunnel
exactly as a client would expect, with Envoy acting as a transparent TCP relay
after the CONNECT handshake — no TLS inspection, no certificate pinning, no CA
changes.

### Access logging

Both listeners write structured access logs to stdout, visible via `docker logs`.
Squid logged differently for egress and ingress with no unified view.

### Configuration as code

Envoy reads a protobuf-JSON bootstrap file generated by ToolHive at workload
start. The configuration is typed Go structs serialised to JSON — unit-testable,
diffable, and reproducible. Squid required template-rendered text files.

### Future extensibility

Envoy's xDS API makes it possible to update listeners, clusters, and RBAC
policies at runtime without a container restart. This is not used today, but the
groundwork is there for dynamic policy updates. The transparent L3/L4
interception path (Phase 2, not yet implemented) requires an `original_dst`
listener and iptables rules that Envoy handles cleanly.

## What Envoy Does Not Do

- **Decrypt TLS.** Like Squid, Envoy filters HTTPS on the CONNECT target hostname
and then relays the encrypted stream as-is. No certificate inspection, no
man-in-the-middle.
- **Block non-cooperative traffic.** A workload that opens a raw TCP connection
ignoring `HTTP_PROXY` is contained by the `Internal: true` Docker network
blackhole, not by Envoy. Envoy only sees traffic that goes through the proxy.
- **Replace dnsmasq.** DNS for the workload's internal network is still served by
the dnsmasq container.
- **Run in Kubernetes.** Network isolation is a local-Docker feature only; the
Kubernetes operator has a separate egress gateway path.

## Architecture

### Egress listener (`:3128` — forward proxy)

```
HTTP_PROXY / HTTPS_PROXY → Envoy :3128
└── HCM (upgrade: CONNECT)
├── [optional] RBAC DENY — docker gateway IP (L3) + hostnames (L7)
├── RBAC ALLOW — outbound allowlist (or allow-all)
├── dynamic_forward_proxy — per-request DNS + CONNECT tunnel
└── router
```

The RBAC filters are evaluated top-to-bottom. The gateway DENY filter is present
unless `--allow-docker-gateway` is set; it blocks:
- The resolved Docker bridge gateway IP as a /32 CIDR (`destination_ip`)
- `host.docker.internal` and `gateway.docker.internal` as `:authority` prefix
matches (covers both plain HTTP and HTTPS CONNECT where authority includes the
port, e.g. `host.docker.internal:443`)

The ALLOW filter implements the permission profile's `Outbound` rules:
- `InsecureAllowAll: true` → single wildcard policy (`any: true`)
- `AllowHost: [...]` → per-host `:authority` exact match (or suffix match for
`*.`-prefixed wildcards)
- No outbound permissions configured → empty policy map → Envoy deny-all

### Ingress listener (`0.0.0.0:<port>` — reverse proxy)

```
Proxy runner → host:127.0.0.1:<port> → Docker port binding → Envoy :port
└── HCM
├── router
└── route → STRICT_DNS cluster → <name>:<mcp-port>
```

The ingress listener binds to `0.0.0.0` inside the container so Docker's port
forwarding (which targets the container's bridge IP, not its loopback) can reach
it. The host-side port binding restricts to `127.0.0.1`, so the ingress is only
reachable from the local machine.

The upstream STRICT_DNS cluster resolves the MCP container's hostname inside the
internal Docker network and forwards HTTP traffic to the MCP server port.

The admin interface binds to `127.0.0.1` inside the Envoy container (container
loopback, not reachable via Docker port forwarding) as a precaution against the
admin API being accessible from other containers.

### Bootstrap lifecycle

1. ToolHive generates a protobuf-JSON bootstrap file in `os.TempDir()` at mode
`0600`.
2. The file is bind-mounted read-only into the Envoy container at
`/etc/envoy/envoy.json`.
3. Envoy reads it once at startup.
4. The file is cleaned up when ToolHive removes the workload.

## Selection

```bash
TOOLHIVE_NETWORK_PROXY=envoy thv run --isolate-network <server>
```

`TOOLHIVE_NETWORK_PROXY` accepts:
- `""` or `"squid"` — Squid backend (default)
- `"envoy"` — Envoy backend

An unknown value causes `NewClient` to fail at startup with a descriptive error.
The env var is intentionally not exposed as a CLI flag or CRD field while the
backend is experimental; chart surface and `RunConfig` wiring come later once the
backend is stable.

## Comparison with Squid

| Aspect | Squid (current default) | Envoy |
|--------|------------------------|-------|
| Containers per workload | 3 (egress + ingress + dns) | 2 (combined + dns) |
| Forward proxy | ✓ | ✓ (dynamic_forward_proxy) |
| Reverse proxy (ingress) | ✓ (separate container) | ✓ (second listener, same container) |
| HTTPS CONNECT tunnelling | ✓ | ✓ |
| TLS inspection | ✗ | ✗ |
| L7 hostname deny | ✓ (`dstdomain`) | ✓ (`:authority` header match) |
| L3 IP CIDR deny | Partial (`dst` ACL — DNS-resolved) | ✓ (direct packet match) |
| Wildcard host allowlist | ✓ (dot-prefix) | ✓ (suffix match) |
| Per-request DNS resolution | Via Squid resolver | Via DFP cluster |
| Access logs | Per-container, text format | Unified stdout, structured |
| Config format | Text template | Typed Go structs → protobuf-JSON |
| Runtime config update | Restart required | xDS-capable (not yet used) |
| Upstream image | Stacklok-built | Upstream distroless (pinned tag) |

## Known Limitations

- **Tag-pinned image.** The Envoy image is pinned by tag (`v1.32.3`), not by
digest. A future PR should pin by digest and add a `TOOLHIVE_ENVOY_IMAGE`
override for supply-chain policy requirements (the env var already exists).
- **Admin interface port.** The admin API on `:9901` (loopback-only inside the
container) is always enabled. A follow-up can disable it entirely or make it
conditional.
- **CONNECT access log timing.** Envoy logs CONNECT tunnel entries when the
tunnel closes, not when it opens. With keep-alive HTTP clients the log entry
may be delayed by minutes. Egress access logs are visible in `docker logs` but
appear after the connection closes.
- **No transparent L3/L4.** Non-cooperative traffic (workloads that ignore
`HTTP_PROXY`) is contained by the `Internal: true` network, not Envoy. True
non-bypassable enforcement requires iptables TPROXY + an init container with
`CAP_NET_ADMIN` — this is Phase 2 and requires its own architecture doc.
- **No port-based allowlist.** `AllowPort` from the permission profile is not
yet translated into Envoy policy. Squid honours `AllowPort`; the Envoy backend
currently ignores it.
Loading
Loading