test(e2e): verify opensandbox runtime with codex via CI sidecar#12
Open
zpzjzj wants to merge 4 commits into
Open
test(e2e): verify opensandbox runtime with codex via CI sidecar#12zpzjzj wants to merge 4 commits into
zpzjzj wants to merge 4 commits into
Conversation
Add an end-to-end test that exercises the claude_code engine against a real OpenSandbox runtime, and a CI job that self-hosts the OpenSandbox server on the runner so the opensandbox path is verified on every PR instead of always skipping for lack of an external service. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
astral-sh/setup-uv has no floating v8 major tag, so @v8 fails to resolve. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The opensandbox runtime bootstraps the claude CLI inside the sandbox, so skipIfClaudeUnavailable wrongly skipped the test when the runner lacked a host claude binary. The codex opensandbox test already omits this check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
claude_code already has real-model coverage in the none-runtime e2e job, whereas codex is otherwise only exercised against a fake binary. Running codex here makes the opensandbox job the sole real codex coverage and covers one more agent overall. Also preserve the in-sandbox agent workspace as a CI artifact so failures inside the sandbox (agent bootstrap, model calls) are debuggable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds executed CI coverage for the OpenSandbox runtime. Until now the
opensandbox path never ran in CI — the existing codex opensandbox test
always skipped for lack of an external sandbox service. This PR adds a CI
job that self-hosts the OpenSandbox server on the runner, so the path is
exercised on every PR.
The opensandbox job runs the codex agent: claude_code already has
real-model coverage in the none-runtime e2e job, whereas codex is otherwise
only exercised against a fake binary — so this is its sole real coverage and
adds one more agent overall.
Changes
.github/workflows/ci.yml: newe2e-opensandboxjob. Installsopensandbox-serverviauv, starts it as a background process (Dockerruntime, host network, host
docker.sock), health-checks it, then runsTestAgent_Codex_OpenSandboxRuntimeagainsthttp://127.0.0.1:8080.e2e/agent_test.go: newTestAgent_ClaudeCode_OpenSandboxRuntime(mirrors the codex test; runnable locally / in future CI) and an
openSandboxE2EImage()helper that readsOPENSANDBOX_IMAGE.artifact for post-mortem when execution fails inside the sandbox.
Test plan
make test/make verifypassgo test -tags e2e -run OpenSandbox ./e2epasses locally (skipscleanly without
OPENSANDBOX_API_KEY)e2e-opensandboxCI job is green on this PRNotes for reviewers
node:22: skill-up bootstraps the agent CLI insidethe sandbox (
nvm/node/npm install), which needscurl/git/node.Bare
ubuntu:latestlacks these. Override viaOPENSANDBOX_IMAGE.OPENSANDBOX_*secret isneeded — but the agent still calls a real model, so the job gates on
DASHSCOPE_API_KEYand skips on fork PRs (same pattern as the e2e job).execd_imageis pinned toopensandbox/execd:v1.0.16; bump if theserver log reports an execd compatibility error.
🤖 Generated with Claude Code