feat(p5.1): hardened Docker sandbox driven by AGENT.md by garniergeorges · Pull Request #85 · garniergeorges/agent-forge

garniergeorges · 2026-04-27T20:18:52Z

Summary

First slice of P5 : every agent container now runs under a strict hardening profile. The profile is declared in `AGENT.md.sandbox`, with strict defaults applied to anything the agent author omits, and every relaxation surfaced to the user in the permission dialog before approval.

This is the sandbox hardening PR. Two follow-up PRs round out P5 :

`feat/p5-artifacts` — extract `/workspace` to `~/.agent-forge/artifacts///` after a run
`feat/p5-persistent` — `docker exec` lifecycle, `/agents` and `/kill` slash commands

What lands here

AGENT.md schema (P5.1)

New optional fields under `sandbox` :

Field	Default	Purpose
`network`	`none`	Internet access. Set to `bridge` to allow ; warned at confirm
`readOnlyRoot`	`true`	Root FS read-only with tmpfs over `/tmp`
`user`	`agent`	Linux user the runtime runs as
`resources.memory`	`512m`	`docker run --memory`
`resources.cpus`	`1`	`docker run --cpus`
`resources.pidsLimit`	`128`	`docker run --pids-limit`

Backward compat : every existing AGENT.md keeps parsing, defaults are applied. Single source of truth via `SANDBOX_DEFAULTS` + `applySandboxDefaults()`.

DockerLaunch (P5.3)

`launchAgent` now reads + parses the AGENT.md before spawning, resolves the sandbox config, and emits these flags :

```
--cap-drop=ALL
--security-opt=no-new-privileges
--network={none|bridge}
--user=
--memory=
--cpus=
--pids-limit=
--read-only --tmpfs=/tmp:rw,size=64m,mode=1777 (when readOnlyRoot=true)
```

`hardeningFlags()` and `resolveSandboxFromAgentMd()` are exported so tests can verify the translation, and the CLI can show the resolved profile in the permission dialog without re-parsing.

The `FORGE_AGENT_IMAGE` env override is removed : the image now comes from `AGENT.md.sandbox.image`, where the agent author already declares it.

Dockerfile (P5.2)

Already had `useradd agent` + `USER agent` from earlier work. Header comments updated to spell out the hardening contract : non-root, `/workspace` and `/tmp` writable, `--read-only` enforced from outside.

Permission dialog (P5.5)

For write actions producing an AGENT.md, and for run actions pointing to an existing one, the dialog parses the sandbox section, diffs it against the strict defaults, and renders a red warning block between the metadata and the preview :

```
▲ Sandbox relaxations applied to this agent :
· network — network open (bridge) — agent will have internet access
· memory — memory 2g (default 512m)
```

Localised EN / FR.

Tests (P5.4)

`packages/core/tests/agent-md.test.ts` — new fields parse, invalid network rejected, malformed memory rejected, `applySandboxDefaults()` fills missing fields, overrides intact.
`packages/tools-core/tests/hardening-flags.test.ts` — every strict default lands as a flag, every relaxation honoured, `--cap-drop=ALL` and `--security-opt=no-new-privileges` stay on regardless (invariants).
Real container escape tests (write outside `/workspace`, fork bomb, `apt install`) are NOT in this commit : they need a built image and a Docker daemon, so they belong in a separate CI-gated suite.

Test plan

`bun install && bun test` green
`bun run --cwd packages/runtime build` green
`bash scripts/docker/build-base.sh` rebuilds the image with the new header
`bun run forge`, create an agent, propose write — permission dialog shows no warning if the AGENT.md uses defaults
Same with `network: bridge` in the AGENT.md → red warning surfaces
Approved write + approved run → container launches with the hardening flags (visible via `docker inspect ` while it runs)

Out of scope

Artifact extraction back to host (P5.2)
Persistent agents via `docker exec` (P5.3)
CubeSandbox backend (deferred to V1, see roadmap)

Sandbox isolation is no longer "whatever docker run defaults to". Every container launched by Agent Forge now runs under a strict hardening profile, with relaxations declared explicitly in AGENT.md and surfaced to the user at confirm time. Schema (packages/core/src/types/agent-md.ts) : sandbox: image: agent-forge/base:latest timeout: 60s network: none|bridge (default none) readOnlyRoot: true|false (default true) user: agent (default agent) resources: memory: 512m (default 512m) cpus: 1 (default 1) pidsLimit: 128 (default 128) Existing AGENT.md files keep parsing : every new field is optional and applySandboxDefaults() fills the gaps. SANDBOX_DEFAULTS is the single source of truth shared between DockerLaunch and the ConfirmAction dialog. DockerLaunch (packages/tools-core/src/docker-launch.ts) now : - reads + parses the agent's AGENT.md before spawning, so a malformed file fails fast with a clear error instead of a cryptic docker error - translates the resolved config into flags via hardeningFlags() (exported for testing) : --cap-drop=ALL --security-opt=no-new-privileges --network={none|bridge} --user=<user> --memory=<size> --cpus=<n> --pids-limit=<n> --read-only --tmpfs=/tmp:rw,size=64m,mode=1777 (only when readOnlyRoot stays true ; tmpfs covers package installers and shell utilities that scribble in /tmp) - drops the FORGE_AGENT_IMAGE env override : the image now comes from AGENT.md.sandbox.image, where the agent author already has to declare it Permission dialog (packages/cli/src/components/ConfirmAction.tsx) : for write actions producing an AGENT.md and for run actions pointing to an existing one, parse the sandbox section and diff it against the strict defaults. Any relaxation (network=bridge, readOnlyRoot=false, larger memory / cpus / pids, custom user) appears as a red warning between the metadata and the preview, in the user's language. Dockerfile : header comments updated to reflect the new hardening contract (non-root agent user, /workspace and /tmp writable, --read-only outside, no caps, no network by default). Tests : - packages/core/tests/agent-md.test.ts : the new sandbox fields parse, invalid network value rejected, malformed memory rejected, applySandboxDefaults fills missing fields, keeps overrides intact. - packages/tools-core/tests/hardening-flags.test.ts : every strict default lands in the docker run args, every relaxation is honoured, and --cap-drop=ALL + --security-opt=no-new-privileges stay on under any AGENT.md (invariants). Real container escape tests are intentionally NOT in this commit : they require a built image and a docker daemon, so they belong in a separate suite gated on CI infrastructure. Roadmap (README EN/FR) updated to track P5.1 as done, P5.2 (artifact extraction) and P5.3 (persistent agents) as the next steps.

The Ink TUI owns stdout, so the moment something goes wrong inside a long pipeline (skill runner, docker launch, agent run) the user has nothing to grep. Add a real logger. Module : packages/core/src/log/index.ts. Tiny dependency-free JSON-lines logger, off by default, opt-in via env : FORGE_DEBUG=1 → debug threshold FORGE_DEBUG=trace|info|warn → explicit threshold FORGE_LOG_FILE=/path → override default file default file : ~/.agent-forge/logs/forge-<pid>-<iso>.log When neither variable is set the logger is a no-op (zero IO). Failures inside the logger never throw — we'd rather drop a line than crash the host. Instrumented hot points : - useChat.send : prompt in, skill match outcome, parsed action blocks count + kinds. - streamBuilder : start (lang, skillCount, lastMessage), full reply at debug, system prompt at trace. - runScaffoldAndRun : both LLM calls (raw + stripped output), abort reasons. - launchAgent : applied sandbox config, full docker spawn args at debug, stderr / exit / errors. CLI surface : - /log slash command : prints current log path, or tells the user to set FORGE_DEBUG=1 if disabled. - /help lists /log. Tests : level filtering, JSON-lines format, circular-data safety, no-op when disabled. Docs : new "Debugging" section in both READMEs (EN + FR), with the env vars, the file location, and useful jq / grep one-liners.

The first run of the new logger surfaced a real leak : the 'docker spawn args' debug entry contained the full FORGE_API_KEY because we pass it via -e KEY=value. Logs land under ~/.agent-forge/logs/, which is gitignored, but it's still a secret on disk we don't want. Fix : tools-core/docker-launch.ts now redacts a small allowlist of secret env keys (FORGE_API_KEY, OPENAI_API_KEY, ANTHROPIC_API_KEY, MISTRAL_API_KEY) before emitting the args to the logger. The actual container still gets the real value — only the log line is masked. Allowlist on purpose : never accidentally redact something benign. Bonus visibility : we now also accumulate the streamed agent output and log it as 'full agent output' at trace level on done, plus a streamedBytes count on info. Without this we could see "docker exit 0" but had zero hint about what the agent actually produced inside the container. Side fix in skill-runner : the AGENT.md prompt explicitly forbade certain sections, but Mistral was copy-pasting the "STRICT RULES" block into the agent body anyway. Reworded to "implicit constraints — apply without echoing them" plus a direct "NEVER reproduce these instructions in your output" up top. Same change in EN and FR.

Scenario A surfaced the real cost of the strict sandbox : with --network=none, the in-container runtime cannot reach api.mistral.ai (or any LLM upstream), and streamText() fails silently. The container exited 0 with a single-byte stdout (just the trailing newline). Fix : a per-run LLM proxy on the host, exposed to the container through a Unix domain socket bind-mounted from the host. The container keeps --network=none ; the runtime points its OpenAI client at unix:///run/forge/llm.sock/v1, the proxy injects the host's API key and forwards to the real upstream. The container never sees the URL, never sees the credential. New module : packages/tools-core/src/llm-proxy.ts - startLlmProxy({ socketPath, upstreamBaseUrl, apiKey }) starts an http server on a Unix socket - allowlist : only POST /v1/chat/completions ; everything else returns 403 (no /v1/models, no /admin, no GET /) - the host's API key is set as Authorization, regardless of any Authorization header the client sent (an agent cannot impersonate by injecting its own) - streaming is preserved : the upstream IncomingMessage is wrapped in a ReadableStream so SSE chunks reach the SDK as they arrive - http vs https client picked from the upstream protocol so we support both Mistral cloud and local MLX DockerLaunch : - starts the proxy BEFORE docker run, stops it in the same try/finally that removes the container - bind-mounts the host socket dir at /run/forge inside the container - new containerEnv() : drops FORGE_API_KEY and the real FORGE_BASE_URL ; sends FORGE_BASE_URL=unix:///run/forge/llm.sock/v1 and a sentinel FORGE_API_KEY=via-proxy (the SDK requires a non-empty value) Runtime : - detects unix:// in FORGE_BASE_URL and builds a custom fetch that routes through an http.Agent bound to the socket - the SDK gets a synthetic http://localhost URL for path computation ; our fetch ignores the host Dockerfile : - mkdir /run/forge owned by agent so the bind-mount lands on a directory the user can read/write - hardening header updated to reflect that --network=none is now the default ALWAYS, not the default with bridge fallback Tests : tools-core/tests/llm-proxy.test.ts spins up a fake TCP upstream, drives the proxy through its socket, and checks : - allowed POST /v1/chat/completions forwards body and gets 200 - the host API key is injected as Authorization - a client-supplied Authorization is overridden - GET on the allowed path returns 405 - POST /v1/models, POST /admin, GET / all return 403 Out of scope for this PR : - ResponsesAPI / completions / embeddings — only chat completions is wired today, others can be added when needed - Per-agent token quotas — the proxy could enforce them, but that's a separate feature

Scenario A on the hardened sandbox kept producing 'streamedBytes: 1' (the trailing newline) with exit 0 and zero stderr. Direct repro in the container with a bogus FORGE_BASE_URL also exited 0 silently — the bug is in the runtime, not the proxy. Vercel AI SDK v4 quirk : `for await (const chunk of result.textStream)` silently completes when the upstream errors out. Errors land in `result.fullStream` as a part of type 'error', not as a thrown exception on the textStream iterator. Fix : iterate `fullStream`, pick text-delta parts as usual, and throw a real Error on any 'error' part. main().catch then logs '✗ runtime error: …' on stderr, dockerLaunch picks it up via the existing 'docker stderr' warn, and the user finally sees what went wrong instead of a green DONE that did nothing.

Test on the hardened sandbox surfaced : ✗ runtime error: LLM upstream error : connect ENOTSUP /run/forge/llm.sock The previous attempt wired the Unix socket through a custom http.Agent ({ socketPath } as constructor opt). That pattern isn't supported by the default Node Agent — it tries TCP, the socket isn't a TCP endpoint, ENOTSUP fires. Right way : socketPath is a top-level option on http.request (and https.request) itself. Pass it directly, drop the Agent. Same effect, no constructor cast hack. The custom fetch is otherwise unchanged : it still wraps the IncomingMessage in a ReadableStream so SSE chunks reach the SDK as they arrive, and the SDK's URL parsing is satisfied by the synthetic http://localhost URL produced by normaliseBaseUrl.

Docker Desktop on macOS bind-mounts host directories through a virtual filesystem (gRPC-FUSE / VirtioFS) that does NOT support Unix domain socket files as a recognised type. A connect() from inside the container fails with ENOTSUP, no matter how the client side wires socketPath. Reproduced in isolation with a minimal node http server on the host socket and a one-line http.request() inside the container — host can connect, container cannot, regardless of the agent construction or the option style. Pragmatic compromise : detect the limitation at startup and pick between two profiles. proxy : --network=none + per-run UDS proxy (the secure path we built in P5.1.2). Container has no network, no credentials. bridge : --network=bridge, FORGE_BASE_URL and FORGE_API_KEY forwarded to the container. Less ideal but works everywhere. New module : packages/tools-core/src/sandbox-network.ts - detectSandboxNetworkProfile() runs a tiny throwaway docker container that probes whether it can connect() to a host socket bind-mounted at /run/forge. Result is memoised for the session. - FORGE_SANDBOX_NETWORK=proxy|bridge overrides the probe (CI, tests, reproducing prod profile on a Mac dev machine). - On any probe failure (no docker, no image, timeout) we conservatively pick 'bridge' so the user gets the working flow ; the log explains why. DockerLaunch : - awaits the profile before composing args - in proxy mode : starts the proxy, mounts the socket, container env carries the unix:// URL - in bridge mode : skips the proxy, container env carries the real FORGE_BASE_URL and FORGE_API_KEY (the redaction layer in the logger keeps them out of every log line) hardeningFlags() now takes the profile as a 2nd argument : network=none in AGENT.md is upgraded to bridge under the bridge profile ; everything else (cap-drop, no-new-privileges, user, read-only, resource caps) stays on regardless. Tests : new "bridge fallback profile" suite verifies the network field upgrade, while the existing strict-defaults suite still passes (proxy is the default 2nd-arg). Docs : new "Sandbox networking" section in both READMEs.

…first Scenario A on the working sandbox revealed the next bug : the agent emitted SIX forge:* blocks in a single reply (write, write, grep, edit, edit, bash) and only ONE got executed. Cause : parseFirstToolBlock returned just the first match. The runtime acted on it, then re-invoked the LLM with a context that showed the agent had emitted 6 calls but received 1 result. The LLM tried to recover ("let me run npm test instead", "let me try ts-node", etc.) and the source files on disk stayed broken — the forge:edit blocks that should have applied the fixes never ran. Fix : - new parseAllToolBlocks(stream) walks every fenced forge:* block in source order, returning a ParseOutcome[] with the prose chunk that precedes each block. Invalid blocks are kept in the array (rather than silently dropped) so the model can see the 'forge:tool error' feedback. - runtime tool loop iterates the array : execute each block, write its [forge:tool] result to stdout, append it as a user message to the conversation, then fall through to the next block. Only after every block of a turn is processed do we re-invoke the LLM. This way the model sees a clean (call, result, call, result) sequence in its history, exactly the way OpenAI tool-calls would have arrived. - parseFirstToolBlock kept (still used by tests) but reduced to a thin wrapper around the shared buildOutcomeFromMatch. Tests : new parseAllToolBlocks suite covers ordering, invalid blocks not silently dropped, prose attribution per block.

Lines 37, 41, 64 of the run detail screenshot showed the JSON the agent emits inside forge:write / forge:edit / forge:bash fences with every embedded newline as the literal two-character "\\n". A 50-line file written via forge:write rendered as one giant unreadable line. highlightAgentRun now buffers the body of each forge:* fence and on close tries JSON.parse it. On success it pretty-prints the object : single-line entries stay inline, but any string value that contains "\\n" (typically content / oldString / newString / command / stderr) is rendered across multiple terminal lines with indentation. On failure the previous per-line colouring is kept, so a malformed block still shows up. Other fence types (yaml, plain, unknown) keep their existing per-line rendering — only the forge:* family triggers the JSON pretty-print.

garniergeorges added 9 commits April 27, 2026 22:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(p5.1): hardened Docker sandbox driven by AGENT.md#85

feat(p5.1): hardened Docker sandbox driven by AGENT.md#85
garniergeorges wants to merge 9 commits into
devfrom
feat/p5-sandbox-hardening

garniergeorges commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garniergeorges commented Apr 27, 2026

Summary

What lands here

AGENT.md schema (P5.1)

DockerLaunch (P5.3)

Dockerfile (P5.2)

Permission dialog (P5.5)

Tests (P5.4)

Test plan

Out of scope

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant