feat(p5.1): hardened Docker sandbox driven by AGENT.md#85
Open
garniergeorges wants to merge 9 commits into
Open
feat(p5.1): hardened Docker sandbox driven by AGENT.md#85garniergeorges wants to merge 9 commits into
garniergeorges wants to merge 9 commits into
Conversation
Sandbox isolation is no longer "whatever docker run defaults to".
Every container launched by Agent Forge now runs under a strict
hardening profile, with relaxations declared explicitly in
AGENT.md and surfaced to the user at confirm time.
Schema (packages/core/src/types/agent-md.ts) :
sandbox:
image: agent-forge/base:latest
timeout: 60s
network: none|bridge (default none)
readOnlyRoot: true|false (default true)
user: agent (default agent)
resources:
memory: 512m (default 512m)
cpus: 1 (default 1)
pidsLimit: 128 (default 128)
Existing AGENT.md files keep parsing : every new field is optional
and applySandboxDefaults() fills the gaps. SANDBOX_DEFAULTS is the
single source of truth shared between DockerLaunch and the
ConfirmAction dialog.
DockerLaunch (packages/tools-core/src/docker-launch.ts) now :
- reads + parses the agent's AGENT.md before spawning, so a
malformed file fails fast with a clear error instead of a
cryptic docker error
- translates the resolved config into flags via hardeningFlags()
(exported for testing) :
--cap-drop=ALL
--security-opt=no-new-privileges
--network={none|bridge}
--user=<user>
--memory=<size>
--cpus=<n>
--pids-limit=<n>
--read-only --tmpfs=/tmp:rw,size=64m,mode=1777
(only when readOnlyRoot stays true ; tmpfs covers package
installers and shell utilities that scribble in /tmp)
- drops the FORGE_AGENT_IMAGE env override : the image now comes
from AGENT.md.sandbox.image, where the agent author already has
to declare it
Permission dialog (packages/cli/src/components/ConfirmAction.tsx) :
for write actions producing an AGENT.md and for run actions
pointing to an existing one, parse the sandbox section and
diff it against the strict defaults. Any relaxation
(network=bridge, readOnlyRoot=false, larger memory / cpus /
pids, custom user) appears as a red warning between the
metadata and the preview, in the user's language.
Dockerfile : header comments updated to reflect the new
hardening contract (non-root agent user, /workspace and /tmp
writable, --read-only outside, no caps, no network by default).
Tests :
- packages/core/tests/agent-md.test.ts : the new sandbox fields
parse, invalid network value rejected, malformed memory
rejected, applySandboxDefaults fills missing fields, keeps
overrides intact.
- packages/tools-core/tests/hardening-flags.test.ts : every
strict default lands in the docker run args, every relaxation
is honoured, and --cap-drop=ALL +
--security-opt=no-new-privileges stay on under any AGENT.md
(invariants).
Real container escape tests are intentionally NOT in this commit :
they require a built image and a docker daemon, so they belong in
a separate suite gated on CI infrastructure.
Roadmap (README EN/FR) updated to track P5.1 as done, P5.2
(artifact extraction) and P5.3 (persistent agents) as the next
steps.
The Ink TUI owns stdout, so the moment something goes wrong inside a long pipeline (skill runner, docker launch, agent run) the user has nothing to grep. Add a real logger. Module : packages/core/src/log/index.ts. Tiny dependency-free JSON-lines logger, off by default, opt-in via env : FORGE_DEBUG=1 → debug threshold FORGE_DEBUG=trace|info|warn → explicit threshold FORGE_LOG_FILE=/path → override default file default file : ~/.agent-forge/logs/forge-<pid>-<iso>.log When neither variable is set the logger is a no-op (zero IO). Failures inside the logger never throw — we'd rather drop a line than crash the host. Instrumented hot points : - useChat.send : prompt in, skill match outcome, parsed action blocks count + kinds. - streamBuilder : start (lang, skillCount, lastMessage), full reply at debug, system prompt at trace. - runScaffoldAndRun : both LLM calls (raw + stripped output), abort reasons. - launchAgent : applied sandbox config, full docker spawn args at debug, stderr / exit / errors. CLI surface : - /log slash command : prints current log path, or tells the user to set FORGE_DEBUG=1 if disabled. - /help lists /log. Tests : level filtering, JSON-lines format, circular-data safety, no-op when disabled. Docs : new "Debugging" section in both READMEs (EN + FR), with the env vars, the file location, and useful jq / grep one-liners.
The first run of the new logger surfaced a real leak : the 'docker spawn args' debug entry contained the full FORGE_API_KEY because we pass it via -e KEY=value. Logs land under ~/.agent-forge/logs/, which is gitignored, but it's still a secret on disk we don't want. Fix : tools-core/docker-launch.ts now redacts a small allowlist of secret env keys (FORGE_API_KEY, OPENAI_API_KEY, ANTHROPIC_API_KEY, MISTRAL_API_KEY) before emitting the args to the logger. The actual container still gets the real value — only the log line is masked. Allowlist on purpose : never accidentally redact something benign. Bonus visibility : we now also accumulate the streamed agent output and log it as 'full agent output' at trace level on done, plus a streamedBytes count on info. Without this we could see "docker exit 0" but had zero hint about what the agent actually produced inside the container. Side fix in skill-runner : the AGENT.md prompt explicitly forbade certain sections, but Mistral was copy-pasting the "STRICT RULES" block into the agent body anyway. Reworded to "implicit constraints — apply without echoing them" plus a direct "NEVER reproduce these instructions in your output" up top. Same change in EN and FR.
Scenario A surfaced the real cost of the strict sandbox : with
--network=none, the in-container runtime cannot reach
api.mistral.ai (or any LLM upstream), and streamText() fails
silently. The container exited 0 with a single-byte stdout (just
the trailing newline).
Fix : a per-run LLM proxy on the host, exposed to the container
through a Unix domain socket bind-mounted from the host. The
container keeps --network=none ; the runtime points its OpenAI
client at unix:///run/forge/llm.sock/v1, the proxy injects the
host's API key and forwards to the real upstream. The container
never sees the URL, never sees the credential.
New module : packages/tools-core/src/llm-proxy.ts
- startLlmProxy({ socketPath, upstreamBaseUrl, apiKey })
starts an http server on a Unix socket
- allowlist : only POST /v1/chat/completions ; everything else
returns 403 (no /v1/models, no /admin, no GET /)
- the host's API key is set as Authorization, regardless of any
Authorization header the client sent (an agent cannot
impersonate by injecting its own)
- streaming is preserved : the upstream IncomingMessage is
wrapped in a ReadableStream so SSE chunks reach the SDK as
they arrive
- http vs https client picked from the upstream protocol so we
support both Mistral cloud and local MLX
DockerLaunch :
- starts the proxy BEFORE docker run, stops it in the same
try/finally that removes the container
- bind-mounts the host socket dir at /run/forge inside the
container
- new containerEnv() : drops FORGE_API_KEY and the real
FORGE_BASE_URL ; sends FORGE_BASE_URL=unix:///run/forge/llm.sock/v1
and a sentinel FORGE_API_KEY=via-proxy (the SDK requires a
non-empty value)
Runtime :
- detects unix:// in FORGE_BASE_URL and builds a custom fetch
that routes through an http.Agent bound to the socket
- the SDK gets a synthetic http://localhost URL for path
computation ; our fetch ignores the host
Dockerfile :
- mkdir /run/forge owned by agent so the bind-mount lands on
a directory the user can read/write
- hardening header updated to reflect that --network=none is
now the default ALWAYS, not the default with bridge fallback
Tests : tools-core/tests/llm-proxy.test.ts spins up a fake TCP
upstream, drives the proxy through its socket, and checks :
- allowed POST /v1/chat/completions forwards body and gets 200
- the host API key is injected as Authorization
- a client-supplied Authorization is overridden
- GET on the allowed path returns 405
- POST /v1/models, POST /admin, GET / all return 403
Out of scope for this PR :
- ResponsesAPI / completions / embeddings — only chat
completions is wired today, others can be added when needed
- Per-agent token quotas — the proxy could enforce them, but
that's a separate feature
Scenario A on the hardened sandbox kept producing 'streamedBytes: 1' (the trailing newline) with exit 0 and zero stderr. Direct repro in the container with a bogus FORGE_BASE_URL also exited 0 silently — the bug is in the runtime, not the proxy. Vercel AI SDK v4 quirk : `for await (const chunk of result.textStream)` silently completes when the upstream errors out. Errors land in `result.fullStream` as a part of type 'error', not as a thrown exception on the textStream iterator. Fix : iterate `fullStream`, pick text-delta parts as usual, and throw a real Error on any 'error' part. main().catch then logs '✗ runtime error: …' on stderr, dockerLaunch picks it up via the existing 'docker stderr' warn, and the user finally sees what went wrong instead of a green DONE that did nothing.
Test on the hardened sandbox surfaced :
✗ runtime error: LLM upstream error : connect ENOTSUP /run/forge/llm.sock
The previous attempt wired the Unix socket through a custom
http.Agent ({ socketPath } as constructor opt). That pattern
isn't supported by the default Node Agent — it tries TCP, the
socket isn't a TCP endpoint, ENOTSUP fires.
Right way : socketPath is a top-level option on http.request
(and https.request) itself. Pass it directly, drop the Agent.
Same effect, no constructor cast hack.
The custom fetch is otherwise unchanged : it still wraps the
IncomingMessage in a ReadableStream so SSE chunks reach the
SDK as they arrive, and the SDK's URL parsing is satisfied by
the synthetic http://localhost URL produced by normaliseBaseUrl.
Docker Desktop on macOS bind-mounts host directories through a
virtual filesystem (gRPC-FUSE / VirtioFS) that does NOT support
Unix domain socket files as a recognised type. A connect() from
inside the container fails with ENOTSUP, no matter how the client
side wires socketPath.
Reproduced in isolation with a minimal node http server on the
host socket and a one-line http.request() inside the container —
host can connect, container cannot, regardless of the agent
construction or the option style.
Pragmatic compromise : detect the limitation at startup and pick
between two profiles.
proxy : --network=none + per-run UDS proxy (the secure path
we built in P5.1.2). Container has no network, no
credentials.
bridge : --network=bridge, FORGE_BASE_URL and FORGE_API_KEY
forwarded to the container. Less ideal but works
everywhere.
New module : packages/tools-core/src/sandbox-network.ts
- detectSandboxNetworkProfile() runs a tiny throwaway docker
container that probes whether it can connect() to a host
socket bind-mounted at /run/forge. Result is memoised for
the session.
- FORGE_SANDBOX_NETWORK=proxy|bridge overrides the probe (CI,
tests, reproducing prod profile on a Mac dev machine).
- On any probe failure (no docker, no image, timeout) we
conservatively pick 'bridge' so the user gets the working
flow ; the log explains why.
DockerLaunch :
- awaits the profile before composing args
- in proxy mode : starts the proxy, mounts the socket,
container env carries the unix:// URL
- in bridge mode : skips the proxy, container env carries the
real FORGE_BASE_URL and FORGE_API_KEY (the redaction layer
in the logger keeps them out of every log line)
hardeningFlags() now takes the profile as a 2nd argument :
network=none in AGENT.md is upgraded to bridge under the bridge
profile ; everything else (cap-drop, no-new-privileges, user,
read-only, resource caps) stays on regardless.
Tests : new "bridge fallback profile" suite verifies the
network field upgrade, while the existing strict-defaults suite
still passes (proxy is the default 2nd-arg).
Docs : new "Sandbox networking" section in both READMEs.
…first
Scenario A on the working sandbox revealed the next bug : the agent
emitted SIX forge:* blocks in a single reply (write, write, grep,
edit, edit, bash) and only ONE got executed.
Cause : parseFirstToolBlock returned just the first match. The
runtime acted on it, then re-invoked the LLM with a context that
showed the agent had emitted 6 calls but received 1 result. The
LLM tried to recover ("let me run npm test instead", "let me try
ts-node", etc.) and the source files on disk stayed broken — the
forge:edit blocks that should have applied the fixes never ran.
Fix :
- new parseAllToolBlocks(stream) walks every fenced forge:* block
in source order, returning a ParseOutcome[] with the prose
chunk that precedes each block. Invalid blocks are kept in
the array (rather than silently dropped) so the model can see
the 'forge:tool error' feedback.
- runtime tool loop iterates the array : execute each block,
write its [forge:tool] result to stdout, append it as a user
message to the conversation, then fall through to the next
block. Only after every block of a turn is processed do we
re-invoke the LLM. This way the model sees a clean (call,
result, call, result) sequence in its history, exactly the
way OpenAI tool-calls would have arrived.
- parseFirstToolBlock kept (still used by tests) but reduced
to a thin wrapper around the shared buildOutcomeFromMatch.
Tests : new parseAllToolBlocks suite covers ordering, invalid
blocks not silently dropped, prose attribution per block.
Lines 37, 41, 64 of the run detail screenshot showed the JSON the agent emits inside forge:write / forge:edit / forge:bash fences with every embedded newline as the literal two-character "\\n". A 50-line file written via forge:write rendered as one giant unreadable line. highlightAgentRun now buffers the body of each forge:* fence and on close tries JSON.parse it. On success it pretty-prints the object : single-line entries stay inline, but any string value that contains "\\n" (typically content / oldString / newString / command / stderr) is rendered across multiple terminal lines with indentation. On failure the previous per-line colouring is kept, so a malformed block still shows up. Other fence types (yaml, plain, unknown) keep their existing per-line rendering — only the forge:* family triggers the JSON pretty-print.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First slice of P5 : every agent container now runs under a strict hardening profile. The profile is declared in `AGENT.md.sandbox`, with strict defaults applied to anything the agent author omits, and every relaxation surfaced to the user in the permission dialog before approval.
This is the sandbox hardening PR. Two follow-up PRs round out P5 :
What lands here
AGENT.md schema (P5.1)
New optional fields under `sandbox` :
Backward compat : every existing AGENT.md keeps parsing, defaults are applied. Single source of truth via `SANDBOX_DEFAULTS` + `applySandboxDefaults()`.
DockerLaunch (P5.3)
`launchAgent` now reads + parses the AGENT.md before spawning, resolves the sandbox config, and emits these flags :
```
--cap-drop=ALL
--security-opt=no-new-privileges
--network={none|bridge}
--user=
--memory=
--cpus=
--pids-limit=
--read-only --tmpfs=/tmp:rw,size=64m,mode=1777 (when readOnlyRoot=true)
```
`hardeningFlags()` and `resolveSandboxFromAgentMd()` are exported so tests can verify the translation, and the CLI can show the resolved profile in the permission dialog without re-parsing.
The `FORGE_AGENT_IMAGE` env override is removed : the image now comes from `AGENT.md.sandbox.image`, where the agent author already declares it.
Dockerfile (P5.2)
Already had `useradd agent` + `USER agent` from earlier work. Header comments updated to spell out the hardening contract : non-root, `/workspace` and `/tmp` writable, `--read-only` enforced from outside.
Permission dialog (P5.5)
For write actions producing an AGENT.md, and for run actions pointing to an existing one, the dialog parses the sandbox section, diffs it against the strict defaults, and renders a red warning block between the metadata and the preview :
```
▲ Sandbox relaxations applied to this agent :
· network — network open (bridge) — agent will have internet access
· memory — memory 2g (default 512m)
```
Localised EN / FR.
Tests (P5.4)
Test plan
Out of scope