Skip to content

feat(p5.1): hardened Docker sandbox driven by AGENT.md#85

Open
garniergeorges wants to merge 9 commits into
devfrom
feat/p5-sandbox-hardening
Open

feat(p5.1): hardened Docker sandbox driven by AGENT.md#85
garniergeorges wants to merge 9 commits into
devfrom
feat/p5-sandbox-hardening

Conversation

@garniergeorges
Copy link
Copy Markdown
Owner

Summary

First slice of P5 : every agent container now runs under a strict hardening profile. The profile is declared in `AGENT.md.sandbox`, with strict defaults applied to anything the agent author omits, and every relaxation surfaced to the user in the permission dialog before approval.

This is the sandbox hardening PR. Two follow-up PRs round out P5 :

  • `feat/p5-artifacts` — extract `/workspace` to `~/.agent-forge/artifacts///` after a run
  • `feat/p5-persistent` — `docker exec` lifecycle, `/agents` and `/kill` slash commands

What lands here

AGENT.md schema (P5.1)

New optional fields under `sandbox` :

Field Default Purpose
`network` `none` Internet access. Set to `bridge` to allow ; warned at confirm
`readOnlyRoot` `true` Root FS read-only with tmpfs over `/tmp`
`user` `agent` Linux user the runtime runs as
`resources.memory` `512m` `docker run --memory`
`resources.cpus` `1` `docker run --cpus`
`resources.pidsLimit` `128` `docker run --pids-limit`

Backward compat : every existing AGENT.md keeps parsing, defaults are applied. Single source of truth via `SANDBOX_DEFAULTS` + `applySandboxDefaults()`.

DockerLaunch (P5.3)

`launchAgent` now reads + parses the AGENT.md before spawning, resolves the sandbox config, and emits these flags :

```
--cap-drop=ALL
--security-opt=no-new-privileges
--network={none|bridge}
--user=
--memory=
--cpus=
--pids-limit=
--read-only --tmpfs=/tmp:rw,size=64m,mode=1777 (when readOnlyRoot=true)
```

`hardeningFlags()` and `resolveSandboxFromAgentMd()` are exported so tests can verify the translation, and the CLI can show the resolved profile in the permission dialog without re-parsing.

The `FORGE_AGENT_IMAGE` env override is removed : the image now comes from `AGENT.md.sandbox.image`, where the agent author already declares it.

Dockerfile (P5.2)

Already had `useradd agent` + `USER agent` from earlier work. Header comments updated to spell out the hardening contract : non-root, `/workspace` and `/tmp` writable, `--read-only` enforced from outside.

Permission dialog (P5.5)

For write actions producing an AGENT.md, and for run actions pointing to an existing one, the dialog parses the sandbox section, diffs it against the strict defaults, and renders a red warning block between the metadata and the preview :

```
▲ Sandbox relaxations applied to this agent :
· network — network open (bridge) — agent will have internet access
· memory — memory 2g (default 512m)
```

Localised EN / FR.

Tests (P5.4)

  • `packages/core/tests/agent-md.test.ts` — new fields parse, invalid network rejected, malformed memory rejected, `applySandboxDefaults()` fills missing fields, overrides intact.
  • `packages/tools-core/tests/hardening-flags.test.ts` — every strict default lands as a flag, every relaxation honoured, `--cap-drop=ALL` and `--security-opt=no-new-privileges` stay on regardless (invariants).
  • Real container escape tests (write outside `/workspace`, fork bomb, `apt install`) are NOT in this commit : they need a built image and a Docker daemon, so they belong in a separate CI-gated suite.

Test plan

  • `bun install && bun test` green
  • `bun run --cwd packages/runtime build` green
  • `bash scripts/docker/build-base.sh` rebuilds the image with the new header
  • `bun run forge`, create an agent, propose write — permission dialog shows no warning if the AGENT.md uses defaults
  • Same with `network: bridge` in the AGENT.md → red warning surfaces
  • Approved write + approved run → container launches with the hardening flags (visible via `docker inspect ` while it runs)

Out of scope

  • Artifact extraction back to host (P5.2)
  • Persistent agents via `docker exec` (P5.3)
  • CubeSandbox backend (deferred to V1, see roadmap)

Sandbox isolation is no longer "whatever docker run defaults to".
Every container launched by Agent Forge now runs under a strict
hardening profile, with relaxations declared explicitly in
AGENT.md and surfaced to the user at confirm time.

Schema (packages/core/src/types/agent-md.ts) :
  sandbox:
    image: agent-forge/base:latest
    timeout: 60s
    network: none|bridge       (default none)
    readOnlyRoot: true|false   (default true)
    user: agent                (default agent)
    resources:
      memory: 512m             (default 512m)
      cpus: 1                  (default 1)
      pidsLimit: 128           (default 128)

Existing AGENT.md files keep parsing : every new field is optional
and applySandboxDefaults() fills the gaps. SANDBOX_DEFAULTS is the
single source of truth shared between DockerLaunch and the
ConfirmAction dialog.

DockerLaunch (packages/tools-core/src/docker-launch.ts) now :
  - reads + parses the agent's AGENT.md before spawning, so a
    malformed file fails fast with a clear error instead of a
    cryptic docker error
  - translates the resolved config into flags via hardeningFlags()
    (exported for testing) :
      --cap-drop=ALL
      --security-opt=no-new-privileges
      --network={none|bridge}
      --user=<user>
      --memory=<size>
      --cpus=<n>
      --pids-limit=<n>
      --read-only --tmpfs=/tmp:rw,size=64m,mode=1777
        (only when readOnlyRoot stays true ; tmpfs covers package
         installers and shell utilities that scribble in /tmp)
  - drops the FORGE_AGENT_IMAGE env override : the image now comes
    from AGENT.md.sandbox.image, where the agent author already has
    to declare it

Permission dialog (packages/cli/src/components/ConfirmAction.tsx) :
  for write actions producing an AGENT.md and for run actions
  pointing to an existing one, parse the sandbox section and
  diff it against the strict defaults. Any relaxation
  (network=bridge, readOnlyRoot=false, larger memory / cpus /
  pids, custom user) appears as a red warning between the
  metadata and the preview, in the user's language.

Dockerfile : header comments updated to reflect the new
hardening contract (non-root agent user, /workspace and /tmp
writable, --read-only outside, no caps, no network by default).

Tests :
  - packages/core/tests/agent-md.test.ts : the new sandbox fields
    parse, invalid network value rejected, malformed memory
    rejected, applySandboxDefaults fills missing fields, keeps
    overrides intact.
  - packages/tools-core/tests/hardening-flags.test.ts : every
    strict default lands in the docker run args, every relaxation
    is honoured, and --cap-drop=ALL +
    --security-opt=no-new-privileges stay on under any AGENT.md
    (invariants).

Real container escape tests are intentionally NOT in this commit :
they require a built image and a docker daemon, so they belong in
a separate suite gated on CI infrastructure.

Roadmap (README EN/FR) updated to track P5.1 as done, P5.2
(artifact extraction) and P5.3 (persistent agents) as the next
steps.
The Ink TUI owns stdout, so the moment something goes wrong inside
a long pipeline (skill runner, docker launch, agent run) the user
has nothing to grep. Add a real logger.

Module : packages/core/src/log/index.ts. Tiny dependency-free
JSON-lines logger, off by default, opt-in via env :

  FORGE_DEBUG=1                 → debug threshold
  FORGE_DEBUG=trace|info|warn   → explicit threshold
  FORGE_LOG_FILE=/path          → override default file
  default file : ~/.agent-forge/logs/forge-<pid>-<iso>.log

When neither variable is set the logger is a no-op (zero IO).
Failures inside the logger never throw — we'd rather drop a line
than crash the host.

Instrumented hot points :
- useChat.send : prompt in, skill match outcome, parsed action
  blocks count + kinds.
- streamBuilder : start (lang, skillCount, lastMessage), full
  reply at debug, system prompt at trace.
- runScaffoldAndRun : both LLM calls (raw + stripped output),
  abort reasons.
- launchAgent : applied sandbox config, full docker spawn args at
  debug, stderr / exit / errors.

CLI surface :
- /log slash command : prints current log path, or tells the user
  to set FORGE_DEBUG=1 if disabled.
- /help lists /log.

Tests : level filtering, JSON-lines format, circular-data safety,
no-op when disabled.

Docs : new "Debugging" section in both READMEs (EN + FR), with
the env vars, the file location, and useful jq / grep one-liners.
The first run of the new logger surfaced a real leak : the
'docker spawn args' debug entry contained the full FORGE_API_KEY
because we pass it via -e KEY=value. Logs land under
~/.agent-forge/logs/, which is gitignored, but it's still a
secret on disk we don't want.

Fix : tools-core/docker-launch.ts now redacts a small allowlist
of secret env keys (FORGE_API_KEY, OPENAI_API_KEY,
ANTHROPIC_API_KEY, MISTRAL_API_KEY) before emitting the args
to the logger. The actual container still gets the real value —
only the log line is masked. Allowlist on purpose : never
accidentally redact something benign.

Bonus visibility : we now also accumulate the streamed agent
output and log it as 'full agent output' at trace level on done,
plus a streamedBytes count on info. Without this we could see
"docker exit 0" but had zero hint about what the agent actually
produced inside the container.

Side fix in skill-runner : the AGENT.md prompt explicitly
forbade certain sections, but Mistral was copy-pasting the
"STRICT RULES" block into the agent body anyway. Reworded to
"implicit constraints — apply without echoing them" plus a
direct "NEVER reproduce these instructions in your output" up
top. Same change in EN and FR.
Scenario A surfaced the real cost of the strict sandbox : with
--network=none, the in-container runtime cannot reach
api.mistral.ai (or any LLM upstream), and streamText() fails
silently. The container exited 0 with a single-byte stdout (just
the trailing newline).

Fix : a per-run LLM proxy on the host, exposed to the container
through a Unix domain socket bind-mounted from the host. The
container keeps --network=none ; the runtime points its OpenAI
client at unix:///run/forge/llm.sock/v1, the proxy injects the
host's API key and forwards to the real upstream. The container
never sees the URL, never sees the credential.

New module : packages/tools-core/src/llm-proxy.ts
  - startLlmProxy({ socketPath, upstreamBaseUrl, apiKey })
    starts an http server on a Unix socket
  - allowlist : only POST /v1/chat/completions ; everything else
    returns 403 (no /v1/models, no /admin, no GET /)
  - the host's API key is set as Authorization, regardless of any
    Authorization header the client sent (an agent cannot
    impersonate by injecting its own)
  - streaming is preserved : the upstream IncomingMessage is
    wrapped in a ReadableStream so SSE chunks reach the SDK as
    they arrive
  - http vs https client picked from the upstream protocol so we
    support both Mistral cloud and local MLX

DockerLaunch :
  - starts the proxy BEFORE docker run, stops it in the same
    try/finally that removes the container
  - bind-mounts the host socket dir at /run/forge inside the
    container
  - new containerEnv() : drops FORGE_API_KEY and the real
    FORGE_BASE_URL ; sends FORGE_BASE_URL=unix:///run/forge/llm.sock/v1
    and a sentinel FORGE_API_KEY=via-proxy (the SDK requires a
    non-empty value)

Runtime :
  - detects unix:// in FORGE_BASE_URL and builds a custom fetch
    that routes through an http.Agent bound to the socket
  - the SDK gets a synthetic http://localhost URL for path
    computation ; our fetch ignores the host

Dockerfile :
  - mkdir /run/forge owned by agent so the bind-mount lands on
    a directory the user can read/write
  - hardening header updated to reflect that --network=none is
    now the default ALWAYS, not the default with bridge fallback

Tests : tools-core/tests/llm-proxy.test.ts spins up a fake TCP
upstream, drives the proxy through its socket, and checks :
  - allowed POST /v1/chat/completions forwards body and gets 200
  - the host API key is injected as Authorization
  - a client-supplied Authorization is overridden
  - GET on the allowed path returns 405
  - POST /v1/models, POST /admin, GET / all return 403

Out of scope for this PR :
  - ResponsesAPI / completions / embeddings — only chat
    completions is wired today, others can be added when needed
  - Per-agent token quotas — the proxy could enforce them, but
    that's a separate feature
Scenario A on the hardened sandbox kept producing 'streamedBytes: 1'
(the trailing newline) with exit 0 and zero stderr. Direct repro
in the container with a bogus FORGE_BASE_URL also exited 0
silently — the bug is in the runtime, not the proxy.

Vercel AI SDK v4 quirk : `for await (const chunk of result.textStream)`
silently completes when the upstream errors out. Errors land in
`result.fullStream` as a part of type 'error', not as a thrown
exception on the textStream iterator.

Fix : iterate `fullStream`, pick text-delta parts as usual, and
throw a real Error on any 'error' part. main().catch then logs
'✗ runtime error: …' on stderr, dockerLaunch picks it up via
the existing 'docker stderr' warn, and the user finally sees what
went wrong instead of a green DONE that did nothing.
Test on the hardened sandbox surfaced :

  ✗ runtime error: LLM upstream error : connect ENOTSUP /run/forge/llm.sock

The previous attempt wired the Unix socket through a custom
http.Agent ({ socketPath } as constructor opt). That pattern
isn't supported by the default Node Agent — it tries TCP, the
socket isn't a TCP endpoint, ENOTSUP fires.

Right way : socketPath is a top-level option on http.request
(and https.request) itself. Pass it directly, drop the Agent.
Same effect, no constructor cast hack.

The custom fetch is otherwise unchanged : it still wraps the
IncomingMessage in a ReadableStream so SSE chunks reach the
SDK as they arrive, and the SDK's URL parsing is satisfied by
the synthetic http://localhost URL produced by normaliseBaseUrl.
Docker Desktop on macOS bind-mounts host directories through a
virtual filesystem (gRPC-FUSE / VirtioFS) that does NOT support
Unix domain socket files as a recognised type. A connect() from
inside the container fails with ENOTSUP, no matter how the client
side wires socketPath.

Reproduced in isolation with a minimal node http server on the
host socket and a one-line http.request() inside the container —
host can connect, container cannot, regardless of the agent
construction or the option style.

Pragmatic compromise : detect the limitation at startup and pick
between two profiles.

  proxy  : --network=none + per-run UDS proxy (the secure path
           we built in P5.1.2). Container has no network, no
           credentials.
  bridge : --network=bridge, FORGE_BASE_URL and FORGE_API_KEY
           forwarded to the container. Less ideal but works
           everywhere.

New module : packages/tools-core/src/sandbox-network.ts
  - detectSandboxNetworkProfile() runs a tiny throwaway docker
    container that probes whether it can connect() to a host
    socket bind-mounted at /run/forge. Result is memoised for
    the session.
  - FORGE_SANDBOX_NETWORK=proxy|bridge overrides the probe (CI,
    tests, reproducing prod profile on a Mac dev machine).
  - On any probe failure (no docker, no image, timeout) we
    conservatively pick 'bridge' so the user gets the working
    flow ; the log explains why.

DockerLaunch :
  - awaits the profile before composing args
  - in proxy mode : starts the proxy, mounts the socket,
    container env carries the unix:// URL
  - in bridge mode : skips the proxy, container env carries the
    real FORGE_BASE_URL and FORGE_API_KEY (the redaction layer
    in the logger keeps them out of every log line)

hardeningFlags() now takes the profile as a 2nd argument :
network=none in AGENT.md is upgraded to bridge under the bridge
profile ; everything else (cap-drop, no-new-privileges, user,
read-only, resource caps) stays on regardless.

Tests : new "bridge fallback profile" suite verifies the
network field upgrade, while the existing strict-defaults suite
still passes (proxy is the default 2nd-arg).

Docs : new "Sandbox networking" section in both READMEs.
…first

Scenario A on the working sandbox revealed the next bug : the agent
emitted SIX forge:* blocks in a single reply (write, write, grep,
edit, edit, bash) and only ONE got executed.

Cause : parseFirstToolBlock returned just the first match. The
runtime acted on it, then re-invoked the LLM with a context that
showed the agent had emitted 6 calls but received 1 result. The
LLM tried to recover ("let me run npm test instead", "let me try
ts-node", etc.) and the source files on disk stayed broken — the
forge:edit blocks that should have applied the fixes never ran.

Fix :

  - new parseAllToolBlocks(stream) walks every fenced forge:* block
    in source order, returning a ParseOutcome[] with the prose
    chunk that precedes each block. Invalid blocks are kept in
    the array (rather than silently dropped) so the model can see
    the 'forge:tool error' feedback.

  - runtime tool loop iterates the array : execute each block,
    write its [forge:tool] result to stdout, append it as a user
    message to the conversation, then fall through to the next
    block. Only after every block of a turn is processed do we
    re-invoke the LLM. This way the model sees a clean (call,
    result, call, result) sequence in its history, exactly the
    way OpenAI tool-calls would have arrived.

  - parseFirstToolBlock kept (still used by tests) but reduced
    to a thin wrapper around the shared buildOutcomeFromMatch.

Tests : new parseAllToolBlocks suite covers ordering, invalid
blocks not silently dropped, prose attribution per block.
Lines 37, 41, 64 of the run detail screenshot showed the JSON the
agent emits inside forge:write / forge:edit / forge:bash fences
with every embedded newline as the literal two-character "\\n".
A 50-line file written via forge:write rendered as one giant
unreadable line.

highlightAgentRun now buffers the body of each forge:* fence and
on close tries JSON.parse it. On success it pretty-prints the
object : single-line entries stay inline, but any string value
that contains "\\n" (typically content / oldString / newString /
command / stderr) is rendered across multiple terminal lines with
indentation. On failure the previous per-line colouring is kept,
so a malformed block still shows up.

Other fence types (yaml, plain, unknown) keep their existing
per-line rendering — only the forge:* family triggers the JSON
pretty-print.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant