Problem Statement
I want to run AI coding agents on many issues at once, without trashing my working tree, without supervising each one, and without writing the same orchestration glue in every project. Today I either:
- Run an agent live in my repo and babysit it (no parallelism, no isolation).
- Hand-roll a Docker setup per project, copy-paste prompt loops, and discover at 2am that one agent stomped another's branch.
- Ship the agent into a remote VM and lose the ability to ergonomically inspect commits on my host.
I need a primitive that takes "a prompt, an agent, a sandbox" and reliably returns "a commit on a branch" — so I can build review pipelines, parallel planners, and AFK loops on top.
Solution
A TypeScript library, sanddune, that orchestrates agents inside sandboxes. The user writes a prompt, calls sanddune.run(), and gets a commit on a branch. The library is provider-agnostic on both axes:
- Sandbox providers — Docker, Podman, Vercel Firecracker microVMs, no-sandbox, or a custom provider implementing a single interface. Two flavors: bind-mount (host directory mounted in) and isolated (sandbox has its own filesystem; sanddune syncs in/out).
- Agent providers — Claude Code, Codex, opencode, pi, or a custom provider.
A run goes through three phases: setup (worktree + sandbox + hooks), agent loop (invoke, stream, repeat up to maxIterations or until a completionSignal fires), teardown (capture commits, tear down sandbox).
Four programmatic entry points expose increasing levels of control: run() (one-shot), createSandbox() (reusable sandbox on one branch), createWorktree() (worktree as independent lifecycle), interactive() (TUI session, sync). A CLI (sanddune init, sanddune docker build-image, etc.) handles scaffolding and image lifecycle.
User Stories
- As a developer, I want to install sanddune as a dev dependency and scaffold a sandbox config with one command, so that I can start running agents in the same afternoon I learned the library exists.
- As a developer, I want
sanddune init to refuse to overwrite an existing .sanddune/ directory, so that I never lose customizations to my Dockerfile or prompt.
- As a developer, I want to pick my agent (Claude Code / Codex / opencode / pi) and template (
blank, simple-loop, sequential-reviewer, parallel-planner, parallel-planner-with-review) interactively during init, so that I can scaffold the right starter without remembering flag syntax.
- As a developer, I want
sanddune init to scaffold main.mts when my package.json lacks "type": "module" and main.ts when it does, so that the scaffold runs without ESM/CJS friction.
- As a developer, I want a
prompt.md file scaffolded into .sanddune/ as a convention (not a magic path sanddune reads on its own), so that I can rename it, ignore it, or reference it explicitly via promptFile.
- As a developer, I want to call
run() with agent, sandbox, and prompt and have it commit to my repo and tear the sandbox down, so that I can run an agent in three lines of code.
- As a developer, I want bind-mount providers to default to the head branch strategy, so that the fastest path during local development is also the default.
- As an automation engineer, I want merge-to-head as the default for isolated providers, so that AFK runs can never trash my working tree on failure.
- As a developer, I want to opt into the branch strategy with an explicit branch name, so that I can wire sanddune into a PR-creation pipeline.
- As a developer, I want head to be a compile-time error on isolated providers, so that I can't ship an impossible config.
- As a developer, I want head to be a compile-time error on
createWorktree(), so that the type system rules out worktree-less worktrees.
- As a developer, I want to import sandbox providers from explicit subpaths (e.g.
@missingstudio/sanddune/sandboxes/docker), so that my bundle doesn't pull in every provider's dependencies.
- As a developer, I want
noSandbox() to be accepted only by interactive() and wt.interactive(), so that I can't accidentally launch an unsupervised AFK run on the host.
- As a developer, I want
noSandbox() to not pass --dangerously-skip-permissions to the agent, so that Claude Code's normal permission prompts stay active during my interactive sessions.
- As a developer, I want to write a custom sandbox provider by implementing one factory (
createBindMountSandboxProvider or createIsolatedSandboxProvider), so that I can target any container runtime, VM, or sandbox service my org standardizes on.
- As a developer, I want every
exec call to return { stdout, stderr, exitCode }, so that custom providers have no ambiguity about their contract.
- As a developer, I want streaming output via an optional
onLine callback on exec, so that my custom provider can plug into sanddune's logging without buffering the entire stream.
- As an automation engineer, I want to call
run() once and get back result.iterations, result.commits, result.branch, and result.completionSignal, so that my downstream code can decide what to do without re-parsing logs.
- As a developer, I want a default
completionSignal of <promise>COMPLETE</promise>, so that my prompts and the library agree on a convention out of the box.
- As a developer, I want to override
completionSignal per run with a string or a list of strings, so that I can model multiple terminal states (e.g. TASK_COMPLETE vs TASK_ABORTED).
- As a developer, I want substring-based detection of completion signals against the agent's output stream, so that I don't need a structured protocol — any unique sentinel works.
- As a developer, I want
result.completionSignal to tell me which signal matched (or undefined if none), so that I can branch on the agent's terminal state.
- As a developer, I want sanddune to never inject the completion signal into my prompt, so that the convention stays a convention I write into my prompt deliberately.
- As an automation engineer, I want
maxIterations to bound any agent loop, so that runaway agents don't burn budget unbounded.
- As a developer, I want
idleTimeoutSeconds (default 600) to abort the run if the agent goes silent, so that hung subprocesses don't keep the sandbox alive forever.
- As a developer, I want
idleTimeoutSeconds to reset on every agent output event, so that long-but-active iterations aren't killed prematurely.
- As a developer, I want to pass an
AbortSignal to run() and interactive(), so that I can cancel from a CLI signal handler or a parent supervisor.
- As a developer, I want aborting to kill the agent subprocess and any in-flight hooks immediately, so that cancellation is responsive.
- As a developer, I want aborted runs to preserve the worktree on disk and reject with
signal.reason verbatim, so that I can inspect partial work and propagate the cancellation cause.
- As a developer, I want
createSandbox() to give me a reusable sandbox on a single branch, so that multi-step pipelines (implement → review → revise) don't pay container startup cost per step.
- As a developer, I want
createSandbox() to preserve installed dependencies and build artifacts between calls, so that an npm install in step 1 doesn't repeat in step 2.
- As a developer, I want
await using sandbox = await createSandbox(...) to auto-tear-down via Symbol.asyncDispose, so that I never leak containers when an exception escapes the block.
- As a developer, I want manual
sandbox.close() to return { preservedWorktreePath } set when the worktree was dirty, so that I know where to look on disk after a failure.
- As a developer, I want clean worktrees auto-removed and dirty worktrees preserved, so that successful runs don't leave clutter and failed runs don't lose work.
- As a developer, I want
createWorktree() to give me a worktree as a first-class lifecycle, so that I can run an interactive session, then a sandboxed AFK agent, against the same worktree.
- As a developer, I want
createWorktree() to reject head at compile time, so that "no worktree" can't sneak past type checks.
- As a developer, I want
wt.createSandbox() to be the conceptual primitive (with top-level createSandbox() as a bundled convenience), so that the underlying ownership model is explicit when I need it.
- As a developer, I want split ownership:
sandbox.close() from wt.createSandbox() tears down only the container; wt.close() cleans the worktree, so that I can keep the worktree alive across multiple sandbox instances.
- As a developer, I want top-level
createSandbox()'s sandbox.close() to tear down both container and worktree, so that the convenience case has matching convenience cleanup.
- As a developer, I want
interactive() to launch the agent's TUI inside a sandbox or on the host, so that I can drop into a fresh shell with my repo already in place.
- As a developer, I want
interactive() to return control synchronously when the user exits the TUI, so that orchestration code can resume cleanly.
- As a developer, I want top-level
interactive() to always use the provider's default branch strategy, so that the simple case is unambiguous; for non-default strategies I'll route through createWorktree() + wt.interactive().
- As a prompt author, I want
prompt: "..." to pass my string through to the agent literally, so that I never have an unexpected {{ or !` interpretation.
- As a prompt author, I want passing both
prompt and promptFile (or promptArgs with prompt) to error, so that the contract is unambiguous.
- As a prompt author, I want
promptFile: "./path.md" to enable substitution and shell expansion, so that I can write reusable templates with embedded context.
- As a prompt author, I want
{{KEY}} placeholders substituted from promptArgs before shell expansion, so that I can embed args inside !`gh issue view {{ISSUE_NUMBER}}`.
- As a prompt author, I want a missing
{{KEY}} in run() (AFK mode) to error with the missing key name, so that typos surface immediately.
- As a prompt author, I want a missing
{{KEY}} in interactive() to prompt me at the terminal, so that I can recover without restarting the session.
- As a prompt author, I want unused
promptArgs keys to log a warning, not fail, so that scripts that pass shared arg maps still work.
- As a prompt author, I want
{{SOURCE_BRANCH}} and {{TARGET_BRANCH}} injected automatically into every prompt, so that branching info is one substitution away in any template.
- As a prompt author, I want passing
SOURCE_BRANCH or TARGET_BRANCH in promptArgs to error, so that built-in arguments are unambiguously authoritative.
- As a prompt author, I want
!`command` expressions evaluated in parallel, so that fetching multiple bits of context (issues, commits, status) doesn't serialize.
- As a prompt author, I want shell expressions evaluated inside the sandbox after
sandbox.onSandboxReady hooks, so that they see the same repo state the agent will see.
- As a prompt author, I want a non-zero exit from any shell expression to fail the run immediately, so that I never hand a half-rendered prompt to the agent.
- As a prompt author, I want
!`...` patterns inside promptArgs values treated as inert text, so that I can pass user-authored content (issue titles, PR descriptions) without command-injection risk.
- As a Claude Code user, I want sanddune to capture the per-iteration session JSONL to my host at
~/.claude/projects/<encoded-path>/sessions/<id>.jsonl, so that claude --resume works natively after the run.
- As a Claude Code user, I want
cwd fields inside captured session files rewritten to my host repo root, so that resume reopens in the right directory.
- As a Claude Code user, I want
captureSessions: false to opt out of capture, so that one-off runs don't pollute my sessions directory.
- As a Claude Code user, I want capture failure to log a warning and leave
sessionFilePath undefined, but not fail the run, so that disk-full or permissions issues don't kill an otherwise-successful agent.
- As a Claude Code user, I want
resumeSession: "<id>" to validate the session file exists, transfer it into the sandbox with cwd rewritten, and pass --resume <id> on iteration 1 only, so that I can continue prior work without ceremony.
- As a Claude Code user, I want
resumeSession with maxIterations > 1 to throw before sandbox creation, so that the contract (resume only iteration 1) is enforced loudly.
- As a Claude Code user, I want
resumeSession rejected on sandbox.run(), so that long-lived sandboxes — which don't chain Claude session state across calls — can't pretend to.
- As a developer using a non-Claude agent, I want
captureSessions and resumeSession to be no-ops, so that my agent provider doesn't fail on Claude-specific options.
- As a developer, I want
host.onWorktreeReady (sequential) to run after copyToWorktree and before sandbox creation, so that I can prep files (e.g. cp .env.example .env) before the container starts.
- As a developer, I want
sandbox.onSandboxReady (parallel with host.onSandboxReady) to run after the sandbox is up, so that I can npm install inside the container without blocking host-side observability hooks.
- As a developer, I want host hooks to be
{ command, timeoutMs? } only — no sudo, no cwd — so that the surface is intentionally minimal (use cd or inline env in the command).
- As a developer, I want sandbox hooks to support
sudo: true, so that I can apt-get install build deps inside the container.
- As a developer, I want hooks to default to a 60s timeout and accept
timeoutMs per-hook, so that long installs (e.g. 300s for npm install) don't cap the default.
- As a developer, I want any non-zero hook exit to fail setup immediately, so that broken setup never reaches the agent.
- As a developer, I want the run's
signal threaded into all hooks, so that aborting cancels in-flight installs.
- As a developer, I want both agent provider and sandbox provider to accept an optional
env: Record<string, string>, so that providers can declare their own credential needs at the type level.
- As a developer, I want overlap between agent-provider env and sandbox-provider env to throw at launch, so that ambiguous "who owns this key" cases surface loudly.
- As a developer, I want a 4-source env precedence (lowest → highest:
process.env, .sanddune/.env, provider env, RunOptions.env), so that call-site overrides always win and provider declarations always beat ambient env.
- As a developer, I want
RunOptions.env to overlap freely with provider env, so that I have a no-friction call-site escape hatch.
- As a developer, I want
cwd and promptFile to resolve relative paths against process.cwd() (caller's perspective), so that scripts moved between directories behave predictably.
- As a developer, I want
copyToWorktree to resolve relative paths against cwd (target repo's perspective), so that node_modules and .env.example are conceptually attached to the repo being worked on.
- As a developer, I want
copyToWorktree rejected with branchStrategy: { type: "head" }, so that I can't accidentally try to copy into a worktree that doesn't exist.
- As a developer, I want
name to be prefixed in log output, so that parallel runs don't visually braid in a shared log stream.
- As a developer, I want log-to-file mode as the default for programmatic use, so that
run() doesn't try to repaint a TTY my orchestrator doesn't have.
- As a developer, I want terminal mode opt-in via
logging: { type: "stdout" }, so that interactive use gets spinners and styled summaries.
- As a developer, I want sanddune to print a
tail -f command for the run log, so that I can follow output without guessing the path.
- As a developer, I want an
onAgentStreamEvent callback on the logging option to receive each text/toolCall event with iteration and timestamp, so that I can forward agent output to my observability system without re-implementing parsing.
- As a developer, I want errors thrown by
onAgentStreamEvent swallowed, so that a broken forwarder cannot kill my run.
- As a developer, I want
result.logFilePath populated only in file mode, so that my code can clearly check if (result.logFilePath) before referencing it.
- As a developer, I want
IterationResult.usage (input/output/cache tokens) parsed from captured Claude sessions, so that I can budget runs and surface cost without parsing JSONL myself.
- As a developer, I want
usage to be undefined when capture is off or the agent provider doesn't parse it, so that the absence is unambiguous.
- As a developer, I want
iterations.length to give me the iteration count, so that I don't need a separate counter.
- As a developer, I want commits returned as
{ sha }[], so that I can build a PR description, run a review pipeline, or stage further work.
- As a developer, I want
sandbox.run() to remain usable after an abort fires mid-iteration, so that I can call .run() again with a fresh signal or .close() to tear down — partial work is left for me to inspect with git status.
- As a developer, I want
sanddune docker build-image (and podman build-image) to rebuild from an existing .sanddune/, so that I can iterate on my Dockerfile without re-scaffolding.
- As a developer, I want a
--dockerfile / --containerfile flag to point at a custom file with build context = cwd, so that I can prototype image variants without touching .sanddune/.
- As a developer, I want
sanddune docker remove-image (and podman remove-image) to tear down the image cleanly, so that I can free disk space when done.
- As a developer, I want
sanddune init --image-name ..., --agent, --model, --template flags to skip interactive prompts, so that I can script init in CI or onboarding tooling.
- As a developer picking Podman during init, I want a
Containerfile written instead of Dockerfile and Podman-namespaced CLI commands, so that the scaffold matches the runtime I picked.
- As a developer, I want the default Dockerfile to install Node 22, git, curl, jq, GitHub CLI, Claude Code CLI, and a non-root
agent user, so that the basic loop works without further config.
- As a developer customizing the Dockerfile, I want explicit guidance to keep a non-root user, git, gh, and the Claude Code CLI on PATH, so that I don't accidentally break the contract.
- As a developer, I want Docker/Podman providers to accept
mounts (with absolute, ~, and cwd-relative paths), so that I can mount caches like ~/.npm read-only or share data/ directories.
- As a developer, I want Docker/Podman providers to accept
network (single name or array), so that my container can reach internal services on a private Docker network.
- As a developer, I want Podman support to handle SELinux labels correctly, so that bind mounts work on Fedora/RHEL hosts without manual
chcon.
- As a Vercel user, I want
vercel() to provision a Firecracker microVM via @vercel/sandbox, so that I can fan out cloud isolated runs without managing infra.
- As a developer, I want a documented
name field on every provider for telemetry/error messages, so that "the docker provider failed" reads cleanly in logs.
- As a custom-provider author, I want a reference implementation list (
docker.ts, podman.ts, vercel.ts, test-isolated.ts) called out in the README, so that I can copy the closest match.
- As a maintainer, I want isolated providers to live in the type system from day one (even if Vercel lands first), so that custom isolated providers compile against a stable shape.
- As a developer, I want
claudeCode(model, { effort }) to accept "low" | "medium" | "high" | "max", with "max" Opus-only, so that reasoning effort is a one-line config.
- As a developer, I want
codex(model, { effort }) to accept "low" | "medium" | "high" | "xhigh", mapped to model_reasoning_effort, so that Codex tuning matches its native API.
- As an automation engineer, I want
timeouts: { copyToWorktreeMs } (default 60s) to override built-in lifecycle timeouts, so that large repos don't fail on the copy step.
- As a developer running an interactive session, I want
cwd: "/path/to/other-repo" accepted on interactive(), so that I can drop into a TUI in a repo other than process.cwd().
- As a maintainer, I want agent invoker to be an Effect
Context.Tag service that wraps the raw agent call, so that tests can substitute a recording or scripted fake without running a real agent.
- As a maintainer, I want every iteration to produce at most one commit by convention (the agent may emit multiple, sanddune captures all), so that callers reasoning per-iteration can rely on a stable shape.
- As a developer, I want sanddune to validate the prerequisites (git installed, sandbox provider available) on first use, so that misconfiguration produces an actionable error before the agent starts.
Implementation Decisions
Modules
The library is decomposed into the following modules; each has a clear in/out and is documented in CONTEXT.md vocabulary.
- Branch strategy resolver — Pure function:
(branchStrategy, providerType, hostBranch) → worktree plan. No I/O. Encodes the compatibility matrix (e.g. isolated + head is rejected).
- Worktree manager — Owns
.sanddune/worktrees/ lifecycle: create, lock, detect dirty state via git status --porcelain, preserve-or-remove on close, perform copyToWorktree. The only module that calls git worktree.
- Sandbox provider abstraction —
createBindMountSandboxProvider / createIsolatedSandboxProvider factories returning the sandbox handle contract: exec (with optional onLine streaming), copyFileIn (bind only) / copyIn (isolated), copyFileOut, close, worktreePath. Every exec returns { stdout, stderr, exitCode }.
- Agent provider abstraction —
claudeCode, codex, opencode, pi. Each declares its required env keys at the type level, builds the per-iteration command, parses streamed stdout into text / toolCall events, and (for Claude) extracts sessionId and usage from the session record.
- Agent invoker — Effect
Context.Tag that wraps the agent call for one iteration. The seam tests substitute with scripted fakes — production code never reaches the real subprocess in unit tests.
- Iteration loop — Drives up to
maxIterations calls through the agent invoker, accumulating IterationResult[]. Substring-matches completionSignal (string or string[]) against the merged stream and exits early on first match. Threads idle timeout as a synthesized abort with sanddune-defined reason; the same handle stays usable after timeout.
- Prompt pipeline — Three stages: (1) resolution (inline string vs
promptFile); (2) host-side {{KEY}} substitution against promptArgs ∪ built-ins; (3) sandbox-side !`command` expansion (parallel) inside the sandbox after sandbox.onSandboxReady. Inline prompts skip stages 2 and 3 entirely. promptArgs with an inline prompt errors. Built-ins (SOURCE_BRANCH, TARGET_BRANCH) cannot be overridden. Missing keys error in run() and prompt the user in interactive(). Unused keys warn.
- Hook runner — Runs
host.onWorktreeReady sequentially after copyToWorktree, then runs host.onSandboxReady ∥ sandbox.onSandboxReady in parallel after sandbox creation. Threads abort signal; non-zero exit fails fast; per-hook timeoutMs (default 60_000).
- Env var resolver — Layers four sources in order:
process.env → .sanddune/.env → agent provider env ∪ sandbox provider env (must be disjoint, throws on overlap) → RunOptions.env (free to overlap, last-write-wins).
- Session capture — Claude-only. After each iteration, transfers session JSONL from sandbox to host at
~/.claude/projects/<encoded-path>/sessions/<id>.jsonl, rewriting cwd fields to the host repo root. For resumeSession, the reverse: validates host file exists, transfers in with cwd rewritten to the sandbox path, passes --resume <id> on iteration 1 only. Failure is logged but does not fail the run; IterationResult.sessionFilePath is left undefined.
- Logging engine — Two display modes: log-to-file (default, writes to
.sanddune/logs/, prints tail -f) and terminal (spinners, styled summaries). Both invoke onAgentStreamEvent callback on each agent stream event with { iteration, timestamp, ... }. Callback is sync, fire-and-forget, errors swallowed.
- Public API surface —
run(), createSandbox(), createWorktree(), interactive(). Layered ownership: createSandboxFromWorktree is an internal helper shared between top-level createSandbox() (which owns worktree + sandbox) and wt.createSandbox() (sandbox only — worktree owned by parent Worktree). Both return the same Sandbox type; the ownership contract is documented, not type-encoded.
init CLI — Interactive prompts for agent / backlog manager / template, performs template argument substitution on Dockerfile and scaffold .md files (e.g. {{BACKLOG_MANAGER_TOOLS}}), and builds the image. Refuses to run if .sanddune/ already exists.
build-image / remove-image CLI — Provider-namespaced (sanddune docker build-image, sanddune podman build-image, etc.). --image-name defaults to sanddune:<repo-dir-name>.
Public API surface
| Entry point |
Returns |
Branch strategies allowed |
Sandbox providers allowed |
run() |
RunResult |
per provider default + explicit |
bind-mount, isolated (no noSandbox()) |
createSandbox() |
Sandbox |
implicit branch (single-branch by construction) |
bind-mount, isolated (no noSandbox()) |
createWorktree() |
Worktree |
branch, merge-to-head (no head) |
n/a (sandbox passed to wt.run()) |
interactive() |
InteractiveResult |
provider default only (no per-call override) |
all three including noSandbox() |
Branch strategy compatibility matrix
| Strategy |
Bind-mount |
Isolated |
No-sandbox |
head |
Default |
Rejected |
Default |
merge-to-head |
Allowed |
Default |
Allowed |
branch |
Allowed |
Allowed |
Allowed |
Rejection is at the type level where possible (isolated + head; noSandbox() + run()).
Result contracts
RunResult — iterations: IterationResult[], completionSignal?: string, stdout: string, commits: { sha }[], branch: string, logFilePath?: string (file mode only).
IterationResult — sessionId?: string, sessionFilePath?: string, usage?: IterationUsage (inputTokens, cacheCreationInputTokens, cacheReadInputTokens, outputTokens).
CloseResult — preservedWorktreePath?: string (set only when worktree was dirty).
Path resolution rule
cwd, promptFile → resolve relative to process.cwd() (caller's perspective).
copyToWorktree items → resolve relative to cwd (target repo's perspective).
This split is non-obvious and is documented in CONTEXT.md.
Aborted runs and reusability
When signal fires mid-iteration: the agent subprocess is killed, the call rejects with signal.reason verbatim, the worktree is left in whatever state the killed agent produced (no rollback), and the Sandbox handle remains usable. Idle timeout uses the same mechanism with a sanddune-defined reason.
Resume semantics
resumeSession is a top-level run() concern only — it's about starting a fresh sandbox from a prior session. Long-lived sandboxes don't chain Claude session state across sandbox.run() calls; sandbox.run() rejects resumeSession. Resume + maxIterations > 1 throws before sandbox creation.
Capture is best-effort
Capture failure logs a warning, leaves sessionFilePath undefined, and does not fail the run. Callers requiring a captured session must check sessionFilePath themselves.
Provider env disjoint rule
Agent and sandbox provider env maps must be disjoint — overlap throws at launch because neither provider has authority over a shared key. RunOptions.env is the call-site escape hatch and is allowed to overlap.
Custom-provider DX
Custom sandbox providers implement one of two factories. The bind-mount path is the simpler one (sanddune handles worktrees and commit extraction). Isolated providers implement copyIn (file-or-dir) and copyFileOut. name is required for telemetry. Reference implementations: src/sandboxes/docker.ts, src/sandboxes/podman.ts, src/sandboxes/vercel.ts, src/sandboxes/test-isolated.ts.
CLI surface
| Command |
Purpose |
sanddune init |
Scaffold .sanddune/, build image, refuse if dir exists |
sanddune docker build-image / podman build-image |
Rebuild image; supports --dockerfile / --containerfile |
sanddune docker remove-image / podman remove-image |
Remove image |
ADRs already in scope
The library inherits 11 ratified architectural decisions in docs/adr/: per-step timeouts (0001), cwd option (0002), reuse-worktree-by-default (0003), abort-signal on run()/interactive() (0004), remove chown UID alignment (0005a), usage as raw tokens (0005b), git worktree mounts on Windows (0006), worktree locking (0007), inline prompts skip processing (0008), branch-strategy per call (0009), layered sandbox creation (0010), sandbox-survives-abort (0011). Implementation must respect these.
Testing Decisions
What makes a good test
- Test external behavior, not implementation details. A test that breaks when an internal helper renames is over-fitted; a test that breaks when the public contract changes is exactly right.
- Prefer the agent invoker seam. Unit tests that exercise the iteration loop, completion-signal matching, idle timeout, and stream forwarding should swap the real agent provider for a scripted fake via the
Context.Tag seam — never spawn a real agent subprocess in unit tests.
- Prefer the isolated sandbox seam. Tests that exercise prompt pipelines, hooks, and capture should use the
test-isolated.ts provider (a temp-directory-backed isolated provider) so the same code path is exercised without Docker or Podman.
- Spawn real Docker/Podman only in integration tests. Mark these so they can be opted out in CI when the runtime is missing.
- Separate pure-logic tests from I/O tests. The branch strategy resolver, prompt pipeline (substitution stage), and env var resolver are pure functions and should have unit tests with table-driven cases.
- Test error paths as deliberately as success paths. Missing
{{KEY}}, overlapping provider env, head on isolated, resumeSession with maxIterations > 1, hook timeout, capture failure during a successful run — each is a distinct contract and gets a test.
Modules under unit test
All seven deep modules get focused unit tests:
- Branch strategy resolver — Table-driven cases: each (strategy × provider type × host branch) combination, including the rejected ones (isolated + head).
- Worktree manager — Real git in a temp repo: create/lock/dirty-detect/preserve-or-remove paths;
copyToWorktree with relative-to-cwd resolution; concurrent-creation lock contention (per ADR 0007).
- Prompt pipeline — Inline bypass,
{{KEY}} substitution (host-side, before expansion), built-in injection, missing-key error, unused-key warning, promptArgs + inline error, !` inside promptArgs value treated as inert text. Stage 3 (shell expansion) tested against a fake sandbox handle.
- Iteration loop — Scripted-stream fakes via the agent invoker: completion signal substring match (single + array, first-match-wins),
maxIterations bound, idle timeout fires after silence and resets on output, abort threading.
- Env var resolver — Layering precedence across the four sources; disjoint enforcement throws;
RunOptions.env overlap allowed.
- Session capture — Fake isolated sandbox + temp host directory: JSONL transfer +
cwd rewrite (host-side and sandbox-side), --resume validation (maxIterations > 1 throws, missing host file errors), best-effort failure (capture error → warning + run still succeeds).
- Hook runner — Sequential host hooks (ordering preserved), parallel host/sandbox
onSandboxReady (both started, neither blocks the other), per-hook timeout, signal threading cancels in-flight commands, non-zero exit fails fast.
Integration tests
- End-to-end against
test-isolated.ts — run(), createSandbox() reuse, createWorktree() + wt.run() + wt.createSandbox(), interactive() (with a scripted "TUI" fake). Validates the public API surface and the layered ownership contract for close().
- Real Docker — A small smoke suite that spawns the actual Docker provider on a fixture repo and confirms a commit lands; gated behind a CI flag so contributors without Docker can skip.
Prior art
This is a green-field repo, so there is no prior art inside it yet. Reference patterns from the broader ecosystem:
- Effect
Context.Tag seams — the canonical Effect pattern for swapping a service in tests; the agent invoker uses it.
- Vitest + temp dirs for git — standard Node test pattern; the worktree manager uses it.
- Scripted-stream fakes — generator-based fakes that yield text/toolCall events deterministically; the iteration loop uses these.
Out of Scope
- Implementing isolated sandbox providers beyond
test-isolated.ts and vercel.ts. The type system supports them from day one, but other isolated providers (e.g. Fly.io, gVisor, Kata) are user-built or follow-on work.
- Bundle/patch sync for isolated providers. Mentioned in
CONTEXT.md as a future option but not part of v1.
- Built-in observability backends. sanddune ships the
onAgentStreamEvent hook; integrations with Datadog, OpenTelemetry, etc. are downstream.
- Built-in retry / failure-recovery policies.
sandbox.run() is reusable after abort, but sanddune does not roll back partial edits or commits — callers retry from a clean slate themselves.
- Multi-agent coordination at a single iteration. One iteration = one agent invocation. Pipelines (implement → review → revise) compose at the
run() / sandbox.run() level.
- Web UI / dashboard. sanddune is a library + CLI. Dashboards are downstream products.
- Non-git VCS. Mercurial, jj, etc. are out of scope. Worktrees, branches, commits, and
git status --porcelain are assumed throughout.
- Windows support beyond what ADR 0006 specifies. WSL is the assumed Windows path; native Windows is best-effort.
- Built-in templates beyond the five listed.
blank, simple-loop, sequential-reviewer, parallel-planner, parallel-planner-with-review ship; everything else is user-authored.
- Backlog managers beyond GitHub Issues and Beads. Other tracker integrations are out of scope for v1; the backlog manager abstraction is open for extension but only two implementations ship.
- Server-side / hosted sanddune. Library and CLI only; no daemon, no SaaS.
Further Notes
- Domain vocabulary is authoritative. All implementation, code review, docs, and PRs use the terms defined in
CONTEXT.md (sanddune, sandbox, host, agent, sandbox provider, branch strategy, worktree, source/target branch, agent invoker, iteration, task, completion signal, prompt template, prompt argument, prompt expansion, shell expression, etc.). Avoid retired terms ("workspace", "worktree mode", "the tool").
- ADRs guard prior decisions. When implementing, re-read the ADRs in
docs/adr/ rather than re-debating settled questions. New architectural choices that don't fit an existing ADR get a new ADR before implementation.
- Effect is in the stack. The agent invoker is an Effect
Context.Tag. The codebase will follow Effect conventions for service registration, error channels, and resource management; introduce new services as Context.Tags where the swap-in-tests benefit applies.
tsgo for builds, vitest for tests, npm run typecheck for type-checking. Per Claude.md. CI mirrors these.
- Changesets for user-facing changes. Pre-1.0, all changesets are
patch. Use package.json#name as the changeset name.
- README and CONTEXT.md drift is a real risk. The brief, the README, and
CONTEXT.md overlap heavily. When changing public-facing behavior, update both — don't let one outpace the other.
- This PRD covers v1 surface area, not the full implementation order. The decomposition into deep modules is the seam for sequencing follow-on issues; expect this PRD to be sliced into many issues (one per module + integration tests + CLI commands) by the
to-issues skill.
Problem Statement
I want to run AI coding agents on many issues at once, without trashing my working tree, without supervising each one, and without writing the same orchestration glue in every project. Today I either:
I need a primitive that takes "a prompt, an agent, a sandbox" and reliably returns "a commit on a branch" — so I can build review pipelines, parallel planners, and AFK loops on top.
Solution
A TypeScript library, sanddune, that orchestrates agents inside sandboxes. The user writes a prompt, calls
sanddune.run(), and gets a commit on a branch. The library is provider-agnostic on both axes:A run goes through three phases: setup (worktree + sandbox + hooks), agent loop (invoke, stream, repeat up to
maxIterationsor until acompletionSignalfires), teardown (capture commits, tear down sandbox).Four programmatic entry points expose increasing levels of control:
run()(one-shot),createSandbox()(reusable sandbox on one branch),createWorktree()(worktree as independent lifecycle),interactive()(TUI session, sync). A CLI (sanddune init,sanddune docker build-image, etc.) handles scaffolding and image lifecycle.User Stories
sanddune initto refuse to overwrite an existing.sanddune/directory, so that I never lose customizations to my Dockerfile or prompt.blank,simple-loop,sequential-reviewer,parallel-planner,parallel-planner-with-review) interactively during init, so that I can scaffold the right starter without remembering flag syntax.sanddune initto scaffoldmain.mtswhen mypackage.jsonlacks"type": "module"andmain.tswhen it does, so that the scaffold runs without ESM/CJS friction.prompt.mdfile scaffolded into.sanddune/as a convention (not a magic path sanddune reads on its own), so that I can rename it, ignore it, or reference it explicitly viapromptFile.run()withagent,sandbox, andpromptand have it commit to my repo and tear the sandbox down, so that I can run an agent in three lines of code.createWorktree(), so that the type system rules out worktree-less worktrees.@missingstudio/sanddune/sandboxes/docker), so that my bundle doesn't pull in every provider's dependencies.noSandbox()to be accepted only byinteractive()andwt.interactive(), so that I can't accidentally launch an unsupervised AFK run on the host.noSandbox()to not pass--dangerously-skip-permissionsto the agent, so that Claude Code's normal permission prompts stay active during my interactive sessions.createBindMountSandboxProviderorcreateIsolatedSandboxProvider), so that I can target any container runtime, VM, or sandbox service my org standardizes on.execcall to return{ stdout, stderr, exitCode }, so that custom providers have no ambiguity about their contract.onLinecallback onexec, so that my custom provider can plug into sanddune's logging without buffering the entire stream.run()once and get backresult.iterations,result.commits,result.branch, andresult.completionSignal, so that my downstream code can decide what to do without re-parsing logs.completionSignalof<promise>COMPLETE</promise>, so that my prompts and the library agree on a convention out of the box.completionSignalper run with a string or a list of strings, so that I can model multiple terminal states (e.g.TASK_COMPLETEvsTASK_ABORTED).result.completionSignalto tell me which signal matched (orundefinedif none), so that I can branch on the agent's terminal state.maxIterationsto bound any agent loop, so that runaway agents don't burn budget unbounded.idleTimeoutSeconds(default 600) to abort the run if the agent goes silent, so that hung subprocesses don't keep the sandbox alive forever.idleTimeoutSecondsto reset on every agent output event, so that long-but-active iterations aren't killed prematurely.AbortSignaltorun()andinteractive(), so that I can cancel from a CLI signal handler or a parent supervisor.signal.reasonverbatim, so that I can inspect partial work and propagate the cancellation cause.createSandbox()to give me a reusable sandbox on a single branch, so that multi-step pipelines (implement → review → revise) don't pay container startup cost per step.createSandbox()to preserve installed dependencies and build artifacts between calls, so that annpm installin step 1 doesn't repeat in step 2.await using sandbox = await createSandbox(...)to auto-tear-down viaSymbol.asyncDispose, so that I never leak containers when an exception escapes the block.sandbox.close()to return{ preservedWorktreePath }set when the worktree was dirty, so that I know where to look on disk after a failure.createWorktree()to give me a worktree as a first-class lifecycle, so that I can run an interactive session, then a sandboxed AFK agent, against the same worktree.createWorktree()to rejectheadat compile time, so that "no worktree" can't sneak past type checks.wt.createSandbox()to be the conceptual primitive (with top-levelcreateSandbox()as a bundled convenience), so that the underlying ownership model is explicit when I need it.sandbox.close()fromwt.createSandbox()tears down only the container;wt.close()cleans the worktree, so that I can keep the worktree alive across multiple sandbox instances.createSandbox()'ssandbox.close()to tear down both container and worktree, so that the convenience case has matching convenience cleanup.interactive()to launch the agent's TUI inside a sandbox or on the host, so that I can drop into a fresh shell with my repo already in place.interactive()to return control synchronously when the user exits the TUI, so that orchestration code can resume cleanly.interactive()to always use the provider's default branch strategy, so that the simple case is unambiguous; for non-default strategies I'll route throughcreateWorktree() + wt.interactive().prompt: "..."to pass my string through to the agent literally, so that I never have an unexpected{{or!`interpretation.promptandpromptFile(orpromptArgswithprompt) to error, so that the contract is unambiguous.promptFile: "./path.md"to enable substitution and shell expansion, so that I can write reusable templates with embedded context.{{KEY}}placeholders substituted frompromptArgsbefore shell expansion, so that I can embed args inside!`gh issue view {{ISSUE_NUMBER}}`.{{KEY}}inrun()(AFK mode) to error with the missing key name, so that typos surface immediately.{{KEY}}ininteractive()to prompt me at the terminal, so that I can recover without restarting the session.promptArgskeys to log a warning, not fail, so that scripts that pass shared arg maps still work.{{SOURCE_BRANCH}}and{{TARGET_BRANCH}}injected automatically into every prompt, so that branching info is one substitution away in any template.SOURCE_BRANCHorTARGET_BRANCHinpromptArgsto error, so that built-in arguments are unambiguously authoritative.!`command`expressions evaluated in parallel, so that fetching multiple bits of context (issues, commits, status) doesn't serialize.sandbox.onSandboxReadyhooks, so that they see the same repo state the agent will see.!`...`patterns insidepromptArgsvalues treated as inert text, so that I can pass user-authored content (issue titles, PR descriptions) without command-injection risk.~/.claude/projects/<encoded-path>/sessions/<id>.jsonl, so thatclaude --resumeworks natively after the run.cwdfields inside captured session files rewritten to my host repo root, so that resume reopens in the right directory.captureSessions: falseto opt out of capture, so that one-off runs don't pollute my sessions directory.sessionFilePathundefined, but not fail the run, so that disk-full or permissions issues don't kill an otherwise-successful agent.resumeSession: "<id>"to validate the session file exists, transfer it into the sandbox withcwdrewritten, and pass--resume <id>on iteration 1 only, so that I can continue prior work without ceremony.resumeSessionwithmaxIterations > 1to throw before sandbox creation, so that the contract (resume only iteration 1) is enforced loudly.resumeSessionrejected onsandbox.run(), so that long-lived sandboxes — which don't chain Claude session state across calls — can't pretend to.captureSessionsandresumeSessionto be no-ops, so that my agent provider doesn't fail on Claude-specific options.host.onWorktreeReady(sequential) to run aftercopyToWorktreeand before sandbox creation, so that I can prep files (e.g.cp .env.example .env) before the container starts.sandbox.onSandboxReady(parallel withhost.onSandboxReady) to run after the sandbox is up, so that I cannpm installinside the container without blocking host-side observability hooks.{ command, timeoutMs? }only — nosudo, nocwd— so that the surface is intentionally minimal (usecdor inline env in the command).sudo: true, so that I canapt-get installbuild deps inside the container.timeoutMsper-hook, so that long installs (e.g. 300s fornpm install) don't cap the default.signalthreaded into all hooks, so that aborting cancels in-flight installs.env: Record<string, string>, so that providers can declare their own credential needs at the type level.process.env,.sanddune/.env, provider env,RunOptions.env), so that call-site overrides always win and provider declarations always beat ambient env.RunOptions.envto overlap freely with provider env, so that I have a no-friction call-site escape hatch.cwdandpromptFileto resolve relative paths againstprocess.cwd()(caller's perspective), so that scripts moved between directories behave predictably.copyToWorktreeto resolve relative paths againstcwd(target repo's perspective), so thatnode_modulesand.env.exampleare conceptually attached to the repo being worked on.copyToWorktreerejected withbranchStrategy: { type: "head" }, so that I can't accidentally try to copy into a worktree that doesn't exist.nameto be prefixed in log output, so that parallel runs don't visually braid in a shared log stream.run()doesn't try to repaint a TTY my orchestrator doesn't have.logging: { type: "stdout" }, so that interactive use gets spinners and styled summaries.tail -fcommand for the run log, so that I can follow output without guessing the path.onAgentStreamEventcallback on theloggingoption to receive each text/toolCall event withiterationandtimestamp, so that I can forward agent output to my observability system without re-implementing parsing.onAgentStreamEventswallowed, so that a broken forwarder cannot kill my run.result.logFilePathpopulated only in file mode, so that my code can clearly checkif (result.logFilePath)before referencing it.IterationResult.usage(input/output/cache tokens) parsed from captured Claude sessions, so that I can budget runs and surface cost without parsing JSONL myself.usageto beundefinedwhen capture is off or the agent provider doesn't parse it, so that the absence is unambiguous.iterations.lengthto give me the iteration count, so that I don't need a separate counter.{ sha }[], so that I can build a PR description, run a review pipeline, or stage further work.sandbox.run()to remain usable after an abort fires mid-iteration, so that I can call.run()again with a fresh signal or.close()to tear down — partial work is left for me to inspect withgit status.sanddune docker build-image(andpodman build-image) to rebuild from an existing.sanddune/, so that I can iterate on my Dockerfile without re-scaffolding.--dockerfile/--containerfileflag to point at a custom file with build context = cwd, so that I can prototype image variants without touching.sanddune/.sanddune docker remove-image(andpodman remove-image) to tear down the image cleanly, so that I can free disk space when done.sanddune init --image-name ...,--agent,--model,--templateflags to skip interactive prompts, so that I can script init in CI or onboarding tooling.Containerfilewritten instead ofDockerfileand Podman-namespaced CLI commands, so that the scaffold matches the runtime I picked.agentuser, so that the basic loop works without further config.mounts(with absolute,~, and cwd-relative paths), so that I can mount caches like~/.npmread-only or sharedata/directories.network(single name or array), so that my container can reach internal services on a private Docker network.chcon.vercel()to provision a Firecracker microVM via@vercel/sandbox, so that I can fan out cloud isolated runs without managing infra.namefield on every provider for telemetry/error messages, so that "the docker provider failed" reads cleanly in logs.docker.ts,podman.ts,vercel.ts,test-isolated.ts) called out in the README, so that I can copy the closest match.claudeCode(model, { effort })to accept"low" | "medium" | "high" | "max", with"max"Opus-only, so that reasoning effort is a one-line config.codex(model, { effort })to accept"low" | "medium" | "high" | "xhigh", mapped tomodel_reasoning_effort, so that Codex tuning matches its native API.timeouts: { copyToWorktreeMs }(default 60s) to override built-in lifecycle timeouts, so that large repos don't fail on the copy step.cwd: "/path/to/other-repo"accepted oninteractive(), so that I can drop into a TUI in a repo other thanprocess.cwd().Context.Tagservice that wraps the raw agent call, so that tests can substitute a recording or scripted fake without running a real agent.Implementation Decisions
Modules
The library is decomposed into the following modules; each has a clear in/out and is documented in
CONTEXT.mdvocabulary.(branchStrategy, providerType, hostBranch) → worktree plan. No I/O. Encodes the compatibility matrix (e.g. isolated + head is rejected)..sanddune/worktrees/lifecycle: create, lock, detect dirty state viagit status --porcelain, preserve-or-remove on close, performcopyToWorktree. The only module that callsgit worktree.createBindMountSandboxProvider/createIsolatedSandboxProviderfactories returning the sandbox handle contract:exec(with optionalonLinestreaming),copyFileIn(bind only) /copyIn(isolated),copyFileOut,close,worktreePath. Everyexecreturns{ stdout, stderr, exitCode }.claudeCode,codex,opencode,pi. Each declares its required env keys at the type level, builds the per-iteration command, parses streamed stdout intotext/toolCallevents, and (for Claude) extractssessionIdandusagefrom the session record.Context.Tagthat wraps the agent call for one iteration. The seam tests substitute with scripted fakes — production code never reaches the real subprocess in unit tests.maxIterationscalls through the agent invoker, accumulatingIterationResult[]. Substring-matchescompletionSignal(string or string[]) against the merged stream and exits early on first match. Threads idle timeout as a synthesized abort with sanddune-defined reason; the same handle stays usable after timeout.promptFile); (2) host-side{{KEY}}substitution againstpromptArgs∪ built-ins; (3) sandbox-side!`command`expansion (parallel) inside the sandbox aftersandbox.onSandboxReady. Inline prompts skip stages 2 and 3 entirely.promptArgswith an inline prompt errors. Built-ins (SOURCE_BRANCH,TARGET_BRANCH) cannot be overridden. Missing keys error inrun()and prompt the user ininteractive(). Unused keys warn.host.onWorktreeReadysequentially aftercopyToWorktree, then runshost.onSandboxReady∥sandbox.onSandboxReadyin parallel after sandbox creation. Threads abort signal; non-zero exit fails fast; per-hooktimeoutMs(default 60_000).process.env→.sanddune/.env→ agent providerenv∪ sandbox providerenv(must be disjoint, throws on overlap) →RunOptions.env(free to overlap, last-write-wins).~/.claude/projects/<encoded-path>/sessions/<id>.jsonl, rewritingcwdfields to the host repo root. ForresumeSession, the reverse: validates host file exists, transfers in withcwdrewritten to the sandbox path, passes--resume <id>on iteration 1 only. Failure is logged but does not fail the run;IterationResult.sessionFilePathis leftundefined..sanddune/logs/, printstail -f) and terminal (spinners, styled summaries). Both invokeonAgentStreamEventcallback on each agent stream event with{ iteration, timestamp, ... }. Callback is sync, fire-and-forget, errors swallowed.run(),createSandbox(),createWorktree(),interactive(). Layered ownership:createSandboxFromWorktreeis an internal helper shared between top-levelcreateSandbox()(which owns worktree + sandbox) andwt.createSandbox()(sandbox only — worktree owned by parentWorktree). Both return the sameSandboxtype; the ownership contract is documented, not type-encoded.initCLI — Interactive prompts for agent / backlog manager / template, performs template argument substitution on Dockerfile and scaffold.mdfiles (e.g.{{BACKLOG_MANAGER_TOOLS}}), and builds the image. Refuses to run if.sanddune/already exists.build-image/remove-imageCLI — Provider-namespaced (sanddune docker build-image,sanddune podman build-image, etc.).--image-namedefaults tosanddune:<repo-dir-name>.Public API surface
run()RunResultnoSandbox())createSandbox()Sandboxbranch(single-branch by construction)noSandbox())createWorktree()Worktreebranch,merge-to-head(nohead)wt.run())interactive()InteractiveResultnoSandbox()Branch strategy compatibility matrix
headmerge-to-headbranchRejection is at the type level where possible (isolated + head;
noSandbox()+run()).Result contracts
RunResult—iterations: IterationResult[],completionSignal?: string,stdout: string,commits: { sha }[],branch: string,logFilePath?: string(file mode only).IterationResult—sessionId?: string,sessionFilePath?: string,usage?: IterationUsage(inputTokens,cacheCreationInputTokens,cacheReadInputTokens,outputTokens).CloseResult—preservedWorktreePath?: string(set only when worktree was dirty).Path resolution rule
cwd,promptFile→ resolve relative toprocess.cwd()(caller's perspective).copyToWorktreeitems → resolve relative tocwd(target repo's perspective).This split is non-obvious and is documented in
CONTEXT.md.Aborted runs and reusability
When
signalfires mid-iteration: the agent subprocess is killed, the call rejects withsignal.reasonverbatim, the worktree is left in whatever state the killed agent produced (no rollback), and theSandboxhandle remains usable. Idle timeout uses the same mechanism with a sanddune-defined reason.Resume semantics
resumeSessionis a top-levelrun()concern only — it's about starting a fresh sandbox from a prior session. Long-lived sandboxes don't chain Claude session state acrosssandbox.run()calls;sandbox.run()rejectsresumeSession. Resume +maxIterations > 1throws before sandbox creation.Capture is best-effort
Capture failure logs a warning, leaves
sessionFilePathundefined, and does not fail the run. Callers requiring a captured session must checksessionFilePaththemselves.Provider env disjoint rule
Agent and sandbox provider
envmaps must be disjoint — overlap throws at launch because neither provider has authority over a shared key.RunOptions.envis the call-site escape hatch and is allowed to overlap.Custom-provider DX
Custom sandbox providers implement one of two factories. The bind-mount path is the simpler one (sanddune handles worktrees and commit extraction). Isolated providers implement
copyIn(file-or-dir) andcopyFileOut.nameis required for telemetry. Reference implementations:src/sandboxes/docker.ts,src/sandboxes/podman.ts,src/sandboxes/vercel.ts,src/sandboxes/test-isolated.ts.CLI surface
sanddune init.sanddune/, build image, refuse if dir existssanddune docker build-image/podman build-image--dockerfile/--containerfilesanddune docker remove-image/podman remove-imageADRs already in scope
The library inherits 11 ratified architectural decisions in
docs/adr/: per-step timeouts (0001),cwdoption (0002), reuse-worktree-by-default (0003), abort-signal onrun()/interactive()(0004), remove chown UID alignment (0005a), usage as raw tokens (0005b), git worktree mounts on Windows (0006), worktree locking (0007), inline prompts skip processing (0008), branch-strategy per call (0009), layered sandbox creation (0010), sandbox-survives-abort (0011). Implementation must respect these.Testing Decisions
What makes a good test
Context.Tagseam — never spawn a real agent subprocess in unit tests.test-isolated.tsprovider (a temp-directory-backed isolated provider) so the same code path is exercised without Docker or Podman.{{KEY}}, overlapping provider env,headon isolated,resumeSessionwithmaxIterations > 1, hook timeout, capture failure during a successful run — each is a distinct contract and gets a test.Modules under unit test
All seven deep modules get focused unit tests:
copyToWorktreewith relative-to-cwdresolution; concurrent-creation lock contention (per ADR 0007).{{KEY}}substitution (host-side, before expansion), built-in injection, missing-key error, unused-key warning,promptArgs+ inline error,!`insidepromptArgsvalue treated as inert text. Stage 3 (shell expansion) tested against a fake sandbox handle.maxIterationsbound, idle timeout fires after silence and resets on output, abort threading.RunOptions.envoverlap allowed.cwdrewrite (host-side and sandbox-side),--resumevalidation (maxIterations > 1throws, missing host file errors), best-effort failure (capture error → warning + run still succeeds).onSandboxReady(both started, neither blocks the other), per-hook timeout, signal threading cancels in-flight commands, non-zero exit fails fast.Integration tests
test-isolated.ts—run(),createSandbox()reuse,createWorktree()+wt.run()+wt.createSandbox(),interactive()(with a scripted "TUI" fake). Validates the public API surface and the layered ownership contract forclose().Prior art
This is a green-field repo, so there is no prior art inside it yet. Reference patterns from the broader ecosystem:
Context.Tagseams — the canonical Effect pattern for swapping a service in tests; the agent invoker uses it.Out of Scope
test-isolated.tsandvercel.ts. The type system supports them from day one, but other isolated providers (e.g. Fly.io, gVisor, Kata) are user-built or follow-on work.CONTEXT.mdas a future option but not part of v1.onAgentStreamEventhook; integrations with Datadog, OpenTelemetry, etc. are downstream.sandbox.run()is reusable after abort, but sanddune does not roll back partial edits or commits — callers retry from a clean slate themselves.run()/sandbox.run()level.git status --porcelainare assumed throughout.blank,simple-loop,sequential-reviewer,parallel-planner,parallel-planner-with-reviewship; everything else is user-authored.Further Notes
CONTEXT.md(sanddune, sandbox, host, agent, sandbox provider, branch strategy, worktree, source/target branch, agent invoker, iteration, task, completion signal, prompt template, prompt argument, prompt expansion, shell expression, etc.). Avoid retired terms ("workspace", "worktree mode", "the tool").docs/adr/rather than re-debating settled questions. New architectural choices that don't fit an existing ADR get a new ADR before implementation.Context.Tag. The codebase will follow Effect conventions for service registration, error channels, and resource management; introduce new services asContext.Tags where the swap-in-tests benefit applies.tsgofor builds,vitestfor tests,npm run typecheckfor type-checking. PerClaude.md. CI mirrors these.patch. Usepackage.json#nameas the changeset name.CONTEXT.mdoverlap heavily. When changing public-facing behavior, update both — don't let one outpace the other.to-issuesskill.