sandbox

Ephemeral Docker dev sandbox for personal-OSS work, with structural identity isolation from your work-machine credentials.

The sandbox auto-detects your GitHub login via gh api user and ties every volume, image tag, and container name to it — so this repo works for any fork without editing config.

Designed by a Karpathy-style multi-agent council. Decision trail lives in the commit history (search for "council Stage 6" in git log).

Why

You have two GitHub identities (work + personal). You want a structural wall between them, not a direnv exec discipline alone.
You want to apt install <thing> while exploring a repo without polluting your host OS.
You want learnings from each session to flow back into committable artifacts (dotfiles install scripts, manifest entries, skills) via the snapshot-diff autosave hook.

Zero-config LLM auth

If you already have Claude Code logged in on your host (macOS), the sandbox auto-pipes your Anthropic OAuth credentials into the container via a tmpfs path at up time. Inside the sandbox, claude "just works" without any login step. Credentials persist across docker rm in a per-login named volume; the tmpfs source is shredded after the entrypoint reads it.

No keys, env vars, or login flows required as long as your host has working claude auth status. Conversations started inside the sandbox live in the <login>-claude named volume, isolated from your host's ~/.claude/projects/ (which holds work conversations and is never crossed into the sandbox).

macOS-only for v1.x (probe order: ~/.claude/.credentials.json → macOS keychain). Linux/WSL2 + Codex/Gemini auto-pipe planned for v1.1.

First-time setup (fresh machine walkthrough)

# 1. Host prereqs (macOS shown; Linux is `apt`/`dnf`/`brew`).
brew install gh docker direnv orbstack
gh auth login                          # personal GitHub account
                                       # (work-account login goes elsewhere)

# 2. Choose a workspace directory. The sandbox repo will sit INSIDE it;
#    the workspace is what gets bind-mounted into the container as /workspace/oss.
mkdir -p ~/oss && cd ~/oss

# 3. Clone the sandbox. Its location determines the workspace (the parent dir).
git clone https://github.com/<your-login>/sandbox.git
cd sandbox

# 4. Verify the auto-detection picked your identity:
bin/sandbox.sh doctor

# Expected output:
#   INFO github login:   <your-login>
#   INFO image:          <your-login>/sandbox:v1
#   INFO container name: <your-login>-sandbox
#   INFO volumes:        <your-login>-toolchains, <your-login>-gh
#   OK   workspace      /Users/<you>/oss
#   OK   sandbox $HOME  /Users/<you>/oss/.sandbox-home
#   OK   inbox          /Users/<you>/oss/learnings-inbox

# 5. First run — builds the image, drops you into a shell.
bin/sandbox.sh up

# Inside the container you have:
#   - The host workspace (cloned repos, edit-in-place) at /workspace/oss
#   - Persistent $HOME at /workspace/home (bind, host-inspectable)
#   - Toolchain caches at /workspace/home/.cache/toolchains (named volume)
#   - gh auth state at /workspace/home/.config/gh (named volume)
#   - HTTPS-only git remotes (SSH keys don't tunnel in)
#   - gpgsign off, refused-env guard for work-identity-shaped env vars

Layout (auto-derived)

host                                       container             type      purpose
$SANDBOX_WORKSPACE/                    →   /workspace/oss        bind      OSS source-of-truth
$SANDBOX_WORKSPACE/.sandbox-home/      →   /workspace/home       bind      $HOME (gitignored runtime)
$SANDBOX_WORKSPACE/learnings-inbox/    →   /workspace/inbox      bind      autosave dumps (gitignored)
<login>-toolchains volume              →   /workspace/home/.{nvm,rustup,cargo}     toolchain caches (GB-scale)
<login>-gh volume                      →   /workspace/home/.config/gh              gh oauth state

SANDBOX_WORKSPACE defaults to the directory CONTAINING this repo. Override via SANDBOX_WORKSPACE=/some/path bin/sandbox.sh up.

Two named volumes survive docker rm (toolchains stay; gh auth persists). Everything else is on host bind mounts and inspectable from your editor.

Identity isolation

HTTPS-only remotes inside the container. No SSH agent forwarding — that would tunnel your work SSH key into the sandbox.
GH_TOKEN piped via tmpfs /run/secrets/, never -e, never build args. Re-injected on every sandbox.sh up; shredded after the entrypoint reads it.
Entrypoint REFUSES to start if GITHUB_TOKEN (work-identity-shaped) or any *IDEOGRAM* / *ANTHROPIC_INTERNAL* env var is present. Override via SANDBOX_REFUSE_PATTERNS="" (don't).
Git identity AUTO-DERIVED from gh api user against the piped token — whoever owns the token gets credited; no hardcoded names.
gpg signing disabled inside the sandbox.

Workflow extraction

Snapshot-diff, not interception. On entry: dpkg --get-selections, pip freeze, npm ls -g, env, ls $HOME/bin/. On exit (TERM/INT/EXIT trap): diff and dump to $SANDBOX_INBOX_DIR/<iso-timestamp>/. SIGKILL loses ≤5min thanks to a periodic background autosave.

Secret-shape filter: env diffs strip values matching AWS (AKIA*), Google (AIza*), OpenAI (sk-*), GitHub (ghp_*, github_pat_*) so key shapes never land in inbox files.

You never get an auto-commit. Use your editor:

ls -lt $SANDBOX_INBOX_DIR/
$EDITOR $SANDBOX_INBOX_DIR/<latest>/

Cherry-pick what's worth promoting into the relevant dotfiles file by hand.

Tied to which login?

gh api user against the host's gh auth token (your personal account).
Override: SANDBOX_LOGIN=somename bin/sandbox.sh up.

Inside the container, gh api user against the piped token confirms the same login — both sides agree. If you forked this repo, the volumes auto-namespace to your login on first up.

Workspace-local repos

Two bind mounts (see mounts.env):

Host	Container	Use
`$SANDBOX_WORKSPACE` (default: parent of this repo, e.g. `~/Documents/oss`)	`/workspace/oss`	Personal-OSS repos (`_worklog`, `dotfiles`, …)
`$SANDBOX_PROJECTS_DIR` (default: sibling `~/Documents/projects`)	`/workspace/projects`	Ideogram-internal repos (`factory-brief`, `ui`, …)

Edit and commit on the host with the matching tree identity (oss/.envrc vs projects/.envrc). Use the sandbox only to verify (e.g. npm test):

docker exec cheshirecode-sandbox bash -lc 'cd /workspace/projects/factory-brief && npm test'

Recreate the container after mount changes: bin/sandbox.sh down && bin/sandbox.sh up --no-attach.

Use source ~/Documents/oss/.envrc before up so the piped gh token is cheshirecode, not a work account.

Subcommands

bin/sandbox.sh up               build (if needed) + run + drop into shell
bin/sandbox.sh exec <cmd>       run <cmd> in the running container
bin/sandbox.sh run-headless <cmd> [args...]
                                non-TTY run with stdout/stderr/exit/meta artifacts
bin/sandbox.sh test-repo <name> clone + install + npm test (cheshirecode/*)
bin/sandbox.sh down             stop the container (autosave fires)
bin/sandbox.sh rebuild          force rebuild the image
bin/sandbox.sh doctor           check host preconditions + show layout
bin/sandbox.sh verify-llm-auth  in-container check: piped LLM creds work?
bin/sandbox.sh nuke [--all]     remove container + image + named volumes
                                (--all also removes runtime dirs)

For daemon or agent callers, prefer run-headless over exec:

bin/sandbox.sh up --no-attach
bin/sandbox.sh run-headless bash -lc 'pwd; git status --short'

Each invocation writes a host-inspectable artifact directory under learnings-inbox/headless-runs/<run-id>/ containing command.txt, stdout.log, stderr.log, exit_code, and meta.env. This is the intended wrapper for worklog-manager dry-runs: inspect full artifacts locally, then post only redacted summaries back to GitHub Issues.

Inbox curation: just ls -lt $SANDBOX_INBOX_DIR/. Files are files.

Reproducibility (clear + repeat from scratch)

The whole setup is scriptable and idempotent. To verify on your own machine, or to onboard a fresh box (yours, a fork-owner's, or a CI runner):

# Fresh setup or first install
bin/setup-from-scratch.sh

# To force-rebuild image:
bin/setup-from-scratch.sh --rebuild

# To also verify your real LLM creds authenticate inside the container:
bin/setup-from-scratch.sh --verify-creds

# Nuke everything and prove the setup script reproduces it:
bin/sandbox.sh nuke --all
bin/setup-from-scratch.sh

# CI runs this same path on every push (job: fresh-machine-emulation),
# proving the "works on a vanilla Linux machine" promise.

The setup script's stages are visible at the top of bin/setup-from-scratch.sh — each prints a === N/6 === header so you can watch the pipeline.

Multiple instances from one repo (worktree + direnv pattern)

You can manage several concurrent or switchable sandboxes from this single repo by leaning on git worktree + direnv. No code changes — the existing SANDBOX_LOGIN env override already namespaces the container, image, and named volumes.

# Add a worktree per instance. Each worktree is its own working dir.
git worktree add ../sandbox-foo
git worktree add ../sandbox-bar

# Per worktree, set a distinct SANDBOX_LOGIN via direnv:
cd ../sandbox-foo && echo 'export SANDBOX_LOGIN=cheshirecode-foo' > .envrc && direnv allow
cd ../sandbox-bar && echo 'export SANDBOX_LOGIN=cheshirecode-bar' > .envrc && direnv allow

# Now each worktree spins up an isolated sandbox:
cd ../sandbox-foo && bin/sandbox.sh up   # container: cheshirecode-foo-sandbox
cd ../sandbox-bar && bin/sandbox.sh up   # container: cheshirecode-bar-sandbox

Each instance gets its own container, image tag, and named volumes (<login>-toolchains, <login>-gh, <login>-claude, <login>-codex). Workspace bind-mount is the worktree's parent dir, so projects don't collide.

To see what's running across all instances: docker ps -a. To list volumes: docker volume ls. bin/sandbox.sh nuke operates on the current $SANDBOX_LOGIN only, so one worktree's nuke doesn't touch the others.

Installing LLM CLIs inside the sandbox

The auto-pipe lands Anthropic + Codex credentials at the canonical paths inside the container, but the CLIs themselves are not in the image (image stays small; install-as-needed per the user-choice principle). After your first bin/sandbox.sh up, install them once:

# Inside the sandbox shell:
sudo apt-get install -y nodejs npm
npm install -g @anthropic-ai/claude-code @openai/codex
claude auth status      # should show your host's logged-in account
codex login status      # same

The <login>-toolchains named volume persists the npm cache, so re-installs after nuke (without --all) are fast.

To verify the auto-piped credentials actually authenticate the CLIs:

# From host:
bin/sandbox.sh verify-llm-auth

On Cursor support

Cursor is not in the sandbox's BYO-keys-free auto-pipe today. cursor-agent typically logs in against an employer-tied account (the sandbox's identity-isolation explicitly refuses work credentials). If your cursor-agent status shows a personal-OSS account, this can be revisited. Otherwise: continue to use Cursor on the host, not inside the sandbox.

Not in v1

devcontainer Features registry — would inflate image / build time. Revisit when the v1 footprint stabilizes.
--cap-drop=ALL — needs install.sh to be apt-free at entrypoint first. Hardening backlog.
Token expiry auto-refuse — gh auth token has no TTL API for classic PATs. We warn (not refuse) when the response header is present.
Auto-rebuild on Dockerfile hash change — manual sandbox.sh rebuild is enough for one user. Reconsider with evidence.
Skill-dir RO bind-mount as a generic "drop tools into the sandbox" mechanism — ~/.claude/skills/ style. YAGNI until a real caller.

OrbStack vs Docker Desktop

bin/sandbox.sh works on either. OrbStack is 2-3× faster on macOS (VirtioFS

lighter VM) and free for personal use:

brew install orbstack

bin/sandbox.sh doctor prints a tip if it detects Docker Desktop.

Migration verified (2026-06-07): sandbox lifecycle works end-to-end on OrbStack with no script changes — up --no-attach, exec, test-repo, down, and nuke behave identically. Named volumes (<login>-toolchains, -gh, -claude, -codex) survive a tar-stream copy between Docker contexts (docker --context=desktop-linux run ... tar -cf - → docker --context=orbstack run ... tar -xf -); the migrator's built-in orbctl docker migrate only copies volumes attached to running containers, so detached named volumes need this manual step.

Testing

./tests/run.sh static      # shellcheck + mounts↔devcontainer sync + JSON parse
./tests/run.sh build       # docker build + image-size budget
./tests/run.sh functional  # 9 image-based behavior tests (identity isolation,
                           # token wipe, HTTPS rewrite, secret-shape filter, etc.)
./tests/run.sh all

Tests use literal fake-token-... ASCII strings to exercise the entrypoint's read-and-shred path. No real credentials transit the test boundary.

License

Unlicense.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.devcontainer		.devcontainer
.github		.github
bin		bin
skills/sandbox-lifecycle		skills/sandbox-lifecycle
tests		tests
tools		tools
.gitignore		.gitignore
AGENTS.md		AGENTS.md
DESIGN.md		DESIGN.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
container-autosave.sh		container-autosave.sh
entrypoint.sh		entrypoint.sh
mounts.env		mounts.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sandbox

Why

Zero-config LLM auth

First-time setup (fresh machine walkthrough)

Layout (auto-derived)

Identity isolation

Workflow extraction

Tied to which login?

Workspace-local repos

Subcommands

Reproducibility (clear + repeat from scratch)

Multiple instances from one repo (worktree + direnv pattern)

Installing LLM CLIs inside the sandbox

On Cursor support

Not in v1

OrbStack vs Docker Desktop

Testing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

sandbox

Why

Zero-config LLM auth

First-time setup (fresh machine walkthrough)

Layout (auto-derived)

Identity isolation

Workflow extraction

Tied to which login?

Workspace-local repos

Subcommands

Reproducibility (clear + repeat from scratch)

Multiple instances from one repo (worktree + direnv pattern)

Installing LLM CLIs inside the sandbox

On Cursor support

Not in v1

OrbStack vs Docker Desktop

Testing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages