software factory: add skills for running the boxel software factory inside any AI coding agent#4843
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 29c7c40c24
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "@cardstack/postgres": "workspace:*", | ||
| "@cardstack/runtime-common": "workspace:*", | ||
| "@glint/ember-tsc": "catalog:", | ||
| "@playwright/test": "catalog:", |
There was a problem hiding this comment.
Move Playwright to runtime deps or lazy-load the test command
@playwright/test is added under devDependencies, but the CLI eagerly imports ./commands/test during startup and that module has a top-level import { chromium } from '@playwright/test'. In normal npm/global installs, dev dependencies are not installed, so even unrelated commands (for example boxel --help or boxel profile list) will fail at process start with a module-resolution error before argument parsing. This needs either a runtime dependency or deferred import inside the test command path.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
(Claude here, replying on behalf of @jurgenwerk.)
Good catch — fixed in 177b531. Moved the import { chromium } from the top of commands/test.ts into an async function loadChromium() invoked only inside the runner. boxel --help and every other subcommand no longer touches @playwright/test, so a published install without devDeps boots cleanly. The loader throws a clear error pointing at the install steps if boxel test is ever invoked without Playwright present.
| | Realm creation | `boxel run-command create-realm` (or `boxel realm create`) | | ||
| | Workspace pull / push | `boxel pull` / `boxel push` (realm-sync skill) | | ||
| | Federated search | `boxel search --realm <url> --query '<json>'` (boxel-api skill) | | ||
| | Card-type schema | `boxel run-command get-card-type-schema --realm <url> --input '{module,name}'` | |
There was a problem hiding this comment.
Replace invalid run-command shorthand in factory runbook
The documented invocation boxel run-command get-card-type-schema ... is not a valid run-command specifier for this CLI; run-command expects a full command module reference (for example @cardstack/boxel-host/commands/get-card-type-schema/default). Following the runbook as written will error immediately, which blocks bootstrap and per-issue validation flows that depend on these calls. Update the runbook/skills to use valid specifiers (or a supported alternative command).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
(Claude here, replying on behalf of @jurgenwerk.)
Already addressed in earlier fix-up commits. The file was renamed to docs/runbook.md and every run-command get-card-type-schema reference was rewritten to the full @cardstack/boxel-host/commands/get-card-type-schema/default form with the codeRef wrapper. Grep confirms zero remaining shorthand uses across the runbook and the three skill files.
Host Test Results 1 files ± 0 1 suites ±0 1h 21m 7s ⏱️ - 11m 17s Results for commit ca6abfa. ± Comparison against earlier commit 4f46d82. For more details on these errors, see this check. Realm Server Test Results 1 files ± 0 1 suites ±0 8m 29s ⏱️ -1s Results for commit 4f46d82. ± Comparison against earlier commit f63fe93. For more details on these errors, see this check. |
There was a problem hiding this comment.
Pull request overview
This PR adds an interactive Claude Code-based software factory flow alongside the existing SDK orchestrator, with new Boxel CLI validator commands and rewritten factory skills/runbook for the agent-owned loop.
Changes:
- Adds top-level
boxel lint,boxel parse, andboxel testcommands for realm validation. - Splits software-factory skills into SDK (
.agents/skills-sdk) and interactive (.agents/skills) paths. - Adds interactive runbook and updates Claude/factory documentation for the new workflow.
Reviewed changes
Copilot reviewed 16 out of 18 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
pnpm-lock.yaml |
Adds lockfile entries for new CLI validation dependencies. |
packages/boxel-cli/package.json |
Adds Glint and Playwright dependencies for parse/test commands. |
packages/boxel-cli/bin/boxel.js |
Makes ts-node fallback use the package tsconfig explicitly. |
packages/boxel-cli/scripts/build.ts |
Externals Playwright packages from the esbuild bundle. |
packages/boxel-cli/src/build-program.ts |
Registers the new top-level validation commands. |
packages/boxel-cli/src/lib/find-package-root.ts |
Adds helper for locating the boxel-cli package root. |
packages/boxel-cli/src/commands/lint.ts |
Adds realm-wide/single-file top-level lint command. |
packages/boxel-cli/src/commands/parse.ts |
Adds Glint/JSON parse validation command. |
packages/boxel-cli/src/commands/test.ts |
Adds browser-driven QUnit realm test command. |
packages/software-factory/src/factory-skill-loader.ts |
Points SDK orchestrator skill loading at .agents/skills-sdk. |
packages/software-factory/docs/runbook.md |
Adds the interactive factory runbook. |
packages/software-factory/.gitignore |
Ignores local factory test realm mirrors. |
packages/software-factory/.claude/CLAUDE.md |
Documents separate SDK and interactive skill paths. |
packages/software-factory/.agents/skills/software-factory-bootstrap/SKILL.md |
Rewrites bootstrap instructions for interactive flow. |
packages/software-factory/.agents/skills/software-factory-scheduling/SKILL.md |
Adds scheduling/status lifecycle skill for interactive flow. |
packages/software-factory/.agents/skills/software-factory-operations/SKILL.md |
Rewrites operations skill around CLI validators and validation cards. |
packages/software-factory/.agents/skills-sdk/software-factory-bootstrap/SKILL.md |
Adds SDK snapshot of bootstrap skill. |
packages/software-factory/.agents/skills-sdk/software-factory-operations/SKILL.md |
Adds SDK snapshot of operations skill. |
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Six fixes from the bot reviewers (codex + Copilot) on PR #4843: 1. **Lazy-load Playwright in `boxel test`.** `commands/test.ts` had a top-level `import { chromium } from '@playwright/test'`, but `@playwright/test` is a devDependency and external in our esbuild bundle. A published-from-npm install would crash on every `boxel` invocation (including `boxel --help`) with MODULE_NOT_FOUND before Commander could parse args. Moved the chromium load behind an `async function loadChromium()` invoked only inside the test runner, with a clear error message pointing the user at the install step when it isn't available. 2. **Zero `*.test.gts` → validator failure, not pass.** The previous behavior silently returned `status: 'passed'` for a realm with no tests, which would let the agent mark an Issue done without ever writing one. Now returns `status: 'failed'` with an errorMessage spelling out the contract. 3. **Bounded-poll Spec discovery in `boxel parse`.** Search index settles asynchronously, so a `boxel realm push` immediately followed by `boxel parse` could see zero Specs in the index and silently skip the freshly-pushed `linkedExamples` JSON files — `boxel parse` would report passed when it shouldn't have. Wrapped the Spec search in a 30s/250ms retryWithPoll matching the factory's own parse engine. 4. **Capability table fix.** The runbook listed `boxel run-command create-realm` as an option; no such host command exists. Realm creation is the native `boxel realm create <slug> "<display>"` subcommand only. Removed the bogus alternative. 5. **Verify ALL three subcommands explicitly in skill setup.** The previous `grep -qE '(lint|parse|test)'` would succeed if any one of them appeared in `boxel --help`, letting a partially-stale dist pass setup and then fail mid-run on the missing validator. Both skills now loop over `lint parse test` individually with an anchored grep, and report which one is missing. 6. Stale `run-command get-card-type-schema` shorthand from the earlier review comment had already been corrected during prior fix-up commits; grep confirms no remaining references. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4f46d82 to
ca6abfa
Compare
…Code
Adds the vendor-neutral skills that drive the software factory from
inside an interactive Claude Code session, plus the runbook that
documents how to use them:
- `.agents/skills/software-factory-bootstrap` (rewritten)
- `.agents/skills/software-factory-operations` (rewritten)
- `.agents/skills/software-factory-scheduling` (new)
- `.agents/skills-sdk/{bootstrap,operations}` — verbatim pre-rewrite
snapshots for the existing SDK orchestrator path
- `docs/runbook.md` — single-prompt end-to-end flow
- `src/factory-skill-loader.ts` — load from `.agents/skills-sdk/` so
the orchestrator and interactive flow don't fight
- `.claude/CLAUDE.md` — documents the dual skill directories
- `.gitignore` — ignore the `factory-test-*/` mktemp workspaces
The boxel-cli validator commands (`lint`, `parse`, `test`) the skills
call are added in #4881. This PR depends on that one merging first.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ca6abfa to
9e8e868
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
packages/software-factory/.agents/skills/software-factory-operations/SKILL.md:351
- This
boxel run-command ... instantiate-cardexample similarly omits--jsonand describespassed/error/stackTraceas if they are top-level fields. In practiceboxel run-command --jsonemits{ status, result, error }and the command payload must be extracted by parsing theresultstring. Update the example/wording so the agent can programmatically map outputs intoInstantiateResultvalidation cards.
boxel run-command @cardstack/boxel-host/commands/instantiate-card/default \
--realm "$REALM" \
--input "$(jq -nc --arg m "$MODULE" --arg n "$CARD_NAME" --arg r "$REALM" --arg d "$INSTANCE_DATA" \
'{moduleIdentifier:$m, cardName:$n, realmIdentifier:$r, instanceData:$d}')"
Result fields: passed (bool), and on failure error + stackTrace.
</details>
| MODULE="http://localhost:4201/user/my-realm/sticky-note" | ||
| REALM="http://localhost:4201/user/my-realm/" | ||
| CARD_NAME="StickyNote" | ||
| INSTANCE_PATH="StickyNotes/note-1.json" |
There was a problem hiding this comment.
(Claude here, replying on behalf of @jurgenwerk.)
Fixed in 35a9a01 — example now uses StickyNote/note-1.json (singular, named exactly after the card type). Confirmed against the convention in packages/experiments-realm/ and the catalog realm (e.g. Account/, BlogPost/, Booking/, CocktailRecipe/). Added an inline comment to the example so future readers don't get tempted to pluralize.
| ```bash | ||
| MODULE="http://localhost:4201/user/my-realm/sticky-note" | ||
| REALM="http://localhost:4201/user/my-realm/" | ||
| boxel run-command @cardstack/boxel-host/commands/evaluate-module/default \ | ||
| --realm "$REALM" \ | ||
| --input "$(jq -nc --arg m "$MODULE" --arg r "$REALM" \ | ||
| '{moduleIdentifier:$m, realmIdentifier:$r}')" | ||
| ``` | ||
|
|
||
| Result fields: `passed` (bool), and on failure `error` + `stackTrace`. |
There was a problem hiding this comment.
(Claude here, replying on behalf of @jurgenwerk.)
Fixed in 35a9a01. The evaluate-module and instantiate-card examples now pass --json and the result-field paragraph correctly describes the wrapper: boxel run-command --json returns {status, result, error} where result is a JSON string, and the command's own passed / error / stackTrace live at data.attributes inside the parsed result. Added a jq one-liner showing how to reach the passed boolean (.result | fromjson | .data.attributes.passed).
| ## Skill loading | ||
|
|
||
| The agent's instructions live in `.agents/skills/`. The factory loader | ||
| (`src/factory-skill-loader.ts`) walks three directories: | ||
|
|
||
| 1. `packages/software-factory/.agents/skills/` — factory-specific skills | ||
| (`software-factory-bootstrap`, `software-factory-operations`). | ||
| 2. `packages/boxel-cli/plugin/skills/` — boxel-cli Claude Code plugin | ||
| skills (`boxel-api`, `boxel-command`); same directory the plugin | ||
| distributes to end users. | ||
| 3. monorepo root `.agents/skills/` — general domain skills | ||
| (`boxel-development`, `boxel-file-structure`, `ember-best-practices`). | ||
|
|
||
| `packages/software-factory/.claude/skills` is a symlink to | ||
| `.agents/skills/` so Claude Code and the factory loader read the same | ||
| files. | ||
| Two parallel skill paths exist, one per factory run mode: | ||
|
|
||
| - **SDK orchestrator** (`pnpm factory:go`): the loader at | ||
| `src/factory-skill-loader.ts` reads from | ||
| **`.agents/skills-sdk/`** first. Those skills describe the |
There was a problem hiding this comment.
(Claude here, replying on behalf of @jurgenwerk.)
Fixed in 35a9a01. packages/software-factory/AGENTS.md now documents both skill directories explicitly: .agents/skills-orchestrator/ for the SDK orchestrator (pnpm factory:go) and .agents/skills/ for interactive Claude Code (via the .claude/skills symlink). The factory-skill-loader.ts description was also updated to name skills-orchestrator/ as the primary directory.
(FYI we renamed the directory from skills-sdk/ → skills-orchestrator/ in commit 8e09be6 after this comment was written; "SDK" was too ambiguous.)
`skills-sdk` was too ambiguous — "SDK" doesn't convey what the directory holds. `skills-orchestrator` names the actual consumer (the `pnpm factory:go` Node orchestrator) and parallels `.agents/skills/` (consumed by interactive Claude Code). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Fix instance-folder example: `StickyNotes/` → `StickyNote/` (folders
named after the card type singular, matching catalog/experiments
realm convention).
- Show `--json` on `evaluate-module` and `instantiate-card` examples,
and document the actual wrapper shape: `boxel run-command --json`
returns `{status, result, error}` where `result` is a JSON string;
the command's own `passed`/`error`/`stackTrace` fields live at
`data.attributes` inside the parsed result, not at the top level.
- Update `packages/software-factory/AGENTS.md` to reflect the two
parallel skill directories (`.agents/skills-orchestrator/` for the
SDK orchestrator, `.agents/skills/` for interactive Claude Code).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 35a9a013e8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for cmd in lint parse test; do | ||
| echo "$help_output" | grep -qE "^[[:space:]]+$cmd[[:space:]]" || { |
There was a problem hiding this comment.
Use real CLI commands in the bootstrap gate
In this interactive flow, the first bootstrap step runs this check before creating or pulling the target realm, but the checked-in boxel program does not expose top-level lint, parse, or test subcommands: buildBoxelProgram only registers file, realm, run-command, search, etc., and the existing lint command is registered as boxel file lint. With the repo CLI (and any global install built from it), boxel --help will not match these names, so the agent exits immediately and the new runbook cannot start. Either gate on commands that actually exist or add/register the documented top-level validator commands.
Useful? React with 👍 / 👎.
|
I tested this using Claude Code (With Opus 4.7), and also opencode agent (Qwen 3.6 free) and the factory run worked ok! |
…om-the-orchestrator
Why
Today's software factory runs through
pnpm factory:go— a Node process driving the Claude Agent SDK. Starting June 15, 2026, Anthropic carves Agent SDK usage out of the regular Claude subscription into a separate, smaller monthly credit pool. Interactive Claude Code sessions stay uncapped.This PR teaches the factory to run from inside an interactive Claude Code session. You paste one prompt; the agent does the whole run.
Both type of factories coexist for now
The SDK orchestrator (
pnpm factory:go) is untouched and still works on this branch. The two paths get separate skill directories so they don't fight:.agents/skills-orchestrator/— pre-rewrite content (the orchestrator's factory-MCP tool surface). Read byfactory-skill-loader.ts..agents/skills/— rewritten for the interactive flow (theboxelCLI surface). Read by Claude Code via.claude/skills.When the orchestrator is retired in a follow-up, both go together.
Depends on
This PR uses three new validator commands on
@cardstack/boxel-cli(lint,parse,test) which are landing separately in #4881. Merge that one and republish boxel-cli to npm first; then this PR's setup is just a normal CLI install.How to try it
One-time setup:
Run a factory:
In the Claude session, paste:
The agent creates a scratch workspace in
mktemp -d, bootstraps the realm, works each implementation Issue, runs all five validators (with bail-out limits), persists audit-trail cards underValidations/, marks the project completed. View the result in the Boxel host UI.Richer briefs:
recipe,gradebook, or any of the Wiki briefs.What's in this PR
.agents/skills/:software-factory-bootstrap,software-factory-scheduling,software-factory-operations. These drive the interactive flow.packages/software-factory/docs/runbook.md..agents/skills-orchestrator/so the existing SDK orchestrator keeps working with its original tool surface..agents/skills-orchestrator/while interactive Claude Code reads.agents/skills/.The agent owns the whole loop with bail-out limits (8 iterations / 3 identical failures / 5 attempts) so it can't spiral. Failed Issues end up
blockedwith a comment; the agent moves on. Project flips tocompletedonly when every Issue isdone.Deliberately deferred
boxel parseoutside the monorepo. Today it runs glint locally with monorepo paths. The proper fix is a realm-server_parseendpoint mirroring_lint— tracked in a follow-up ticket.boxel testoutside the monorepo. Today it drives a headless Chromium against the host app's compileddist/. Same proper fix: a built-in realm-server QUnit harness so end users don't need Playwright + Chromium locally — tracked in a follow-up ticket./factory-runslash command wrapping the prompt — follow-up.Validated against
Single-prompt end-to-end runs (no manual intervention) against
sticky-noteandrecipebriefs. Both produce a populated target realm with the full set of expected artifacts includingValidations/. The SDK orchestrator path is unchanged (pnpm --filter @cardstack/software-factory teststill passes).How to review
The three skill markdown files under
packages/software-factory/.agents/skills/are the substantive surface. Everything else is small and self-contained..agents/skills-orchestrator/is a verbatim snapshot of main — skip it.🤖 Generated with Claude Code