diff --git a/HACKATHON_SUBMISSION.md b/HACKATHON_SUBMISSION.md index 7c64695..2c35804 100644 --- a/HACKATHON_SUBMISSION.md +++ b/HACKATHON_SUBMISSION.md @@ -1,98 +1,189 @@ -# Agent Flight Recorder +# NullOS Mission Control ## Problem Discovered -NullWatch already provides the observability layer for the nullclaw ecosystem: -run summaries, spans, evals, OTLP ingest, cost, token usage, and failure context. -It also exports a NullHub-compatible manifest. NullHub already provides the -operator UI and orchestration pages, but it did not register NullWatch or expose -its tracing/eval data in the UI. +The nullclaw ecosystem already has the building blocks of a lightweight local +agent platform: NullHub for control, NullBoiler for orchestration, +NullTickets for tracker-backed work, and NullWatch for traces and evals. +What was missing was a memorable local demo that shows these ideas as one +operator experience. -## Chosen Solution +Without that vertical slice, a new contributor or hackathon judge has to infer +the platform story from separate repositories, APIs, and docs. -Add a local-first Observability cockpit to NullHub: +## Chosen Solution -- register `nullwatch` as a known component -- proxy `/api/observability/*` to a managed NullWatch instance -- add a Flight Recorder page for runs, spans, evals, cost, tokens, and errors -- document the local demo flow through NullHub's managed install path +Add a local-first Mission Control page to NullHub: + +- a deterministic backend mission API under `/api/mission-control` +- a versioned embedded replay fixture for scenario data +- a `/mission-control` control-room UI +- one cinematic workflow showing agent roles, checkpointing, test failure, + human intervention, recovered replay, review, and telemetry +- schema-versioned API responses and structured errors for invalid actions +- NullWatch-style trace references that map replay events to run ids, span ids, + operations, and eval keys +- a replay artifact export for sharing the current snapshot, source fixture, + and ecosystem mapping as JSON +- a local smoke test for the full mission lifecycle +- a judge-mode demo driver and macOS local video recorder +- screenshots and a written demo plan for PR review + +The demo is intentionally deterministic. It does not call hosted services, +require model keys, or depend on a running multi-repo stack. ## Why This Idea Was Chosen -This is stronger than a single CLI preflight because it connects multiple parts -of the ecosystem into a visible agent platform story: execution, orchestration, -task tracking, observability, and operations. It is still hackathon-sized because -it uses existing NullWatch APIs and NullHub UI patterns instead of changing core -agent runtime behavior. +This was chosen over a smaller CLI-only contribution because it creates a +stronger hackathon story: judges can see autonomy, orchestration, +observability, failure recovery, and human-in-the-loop control in under three +minutes. + +It belongs in NullHub because NullHub is already the control plane for the +ecosystem. The page can honestly present simulated NullTickets-style tasks, +NullBoiler-style checkpoints, and NullWatch-style telemetry while leaving a +clear future path to real cross-service wiring. ## What Was Implemented -- NullWatch component registration in the NullHub registry. -- Observability reverse proxy with optional bearer token forwarding. -- Sidebar entry and `/observability` UI page. -- API client methods for NullWatch summary, runs, spans, evals, and health. -- README documentation for the proxy and local demo setup. +- Added `src/api/mission_control.zig` with structured mission state, reset, + launch, recover, deterministic phase progression, telemetry, graph nodes, + graph edges, agent roles, failure details, and recovery details. +- Added `src/api/mission_control/code_red.v1.json` as the versioned replay + fixture for phase timing, graph, events, telemetry, and failure/recovery + metadata. +- Added `src/api/mission_control_replay.zig` to parse and validate replay + fixtures before serving mission state. +- Added validated trace references in mission events so the demo can deep-link + from Mission Control to `/observability?run_id=...` without requiring + NullWatch to be running for the local replay. +- Added explicit response metadata: `schema_version`, `mode`, `scenario_id`, + `scenario_version`, and `generated_at_ms`. +- Added `GET /api/mission-control/replay` to export the current snapshot, + source fixture, and NullTickets/NullBoiler/NullClaw/NullWatch mapping + metadata as a portable JSON artifact. +- Added transition guards so early recovery and duplicate launch return + actionable `409 Conflict` responses. +- Registered the Mission Control API in the NullHub server route table and API + metadata. +- Added typed frontend client methods for mission state and actions. +- Added a sidebar entry and `/mission-control` Svelte page with adaptive + polling, retry handling, trace chips, observability deep links, and responsive + mission panels. +- Added in-screen three-minute story beats and a failed-vs-recovered comparison + panel so the demo narrative remains visible during judging and PR review. +- Added a PR-ready plan file, README documentation, and screenshots. +- Added backend tests for mission path routing, idle state, failure state, + recovery state, action handlers, invalid transitions, and route semantics. +- Added replay fixture tests for duplicate ids, graph references, telemetry + references, trace references, ordering, required fields, and required phases. +- Added `tests/test_mission_control_smoke.sh` for live API validation. +- Added `scripts/mission_control_demo.sh` for a timed judge-mode mission run. +- Added `scripts/record_mission_control_demo.sh` and + `docs/demo/mission-control-local-demo.md` so the local demo can be recorded + as a review video artifact. +- Added `docs/demo/mission-control-replay-artifact.md` to document the export + schema and ecosystem mapping. +- Added `docs/demo/mission-control-pr-package.md` with the copy-ready PR title, + PR description, reviewer path, validation matrix, and three-minute hackathon + story. ## Files Changed -- `src/installer/registry.zig` -- `src/api/observability.zig` -- `src/api/proxy.zig` -- `src/api/components.zig` +- `MISSION_CONTROL_PLAN.md` +- `src/api/mission_control.zig` +- `src/api/mission_control_replay.zig` +- `src/api/mission_control/code_red.v1.json` - `src/api/meta.zig` - `src/root.zig` - `src/server.zig` - `ui/src/lib/api/client.ts` - `ui/src/lib/components/Sidebar.svelte` - `ui/src/routes/observability/+page.svelte` +- `ui/src/routes/mission-control/+page.svelte` +- `tests/test_mission_control_smoke.sh` +- `scripts/mission_control_demo.sh` +- `scripts/record_mission_control_demo.sh` +- `docs/demo/.gitignore` +- `docs/demo/mission-control-local-demo.md` +- `docs/demo/mission-control-replay-artifact.md` +- `docs/demo/mission-control-pr-package.md` +- `docs/screenshots/nullhub-mission-control-live.png` +- `docs/screenshots/nullhub-mission-control-recovered.png` - `README.md` - `HACKATHON_SUBMISSION.md` ## How To Test Or Demo -Start NullHub: +Run the backend tests: ```bash -zig build run -- serve --no-open +zig build test -Dembed-ui=false --summary all ``` -Install NullWatch from NullHub: +Build the UI: -1. Open the web UI. -2. Go to `Install Component`. -3. Select `NullWatch`. -4. Keep or set the API port to `7710`. -5. Finish the wizard. The installer starts the NullWatch instance and NullHub - discovers it automatically. +```bash +npm --prefix ui run build +``` -Optional sample data can be ingested through the NullHub proxy: +Start NullHub locally: ```bash -curl -X POST http://127.0.0.1:19800/api/observability/v1/spans \ - -H 'Content-Type: application/json' \ - -d '{"run_id":"demo-run-1","trace_id":"trace-demo-1","span_id":"span-1","source":"nullclaw","operation":"tool.call","status":"error","started_at_ms":1710000000000,"ended_at_ms":1710000001500,"tool_name":"shell","error_message":"tool call failed: command timed out","attributes_json":"{\"exit_code\":124}"}' +zig build run -- serve --host 127.0.0.1 --port 19802 --no-open +``` -curl -X POST http://127.0.0.1:19800/api/observability/v1/evals \ - -H 'Content-Type: application/json' \ - -d '{"run_id":"demo-run-1","eval_key":"tool_success","scorer":"deterministic","score":0.0,"verdict":"fail","dataset":"demo","notes":"The tool call timed out."}' +Run the live smoke test: + +```bash +NULLHUB_URL=http://127.0.0.1:19802 ./tests/test_mission_control_smoke.sh ``` -Open `/observability` in NullHub and inspect the NullWatch runs. +Run the automated local demo: + +```bash +MISSION_CONTROL_OPEN_BROWSER=1 ./scripts/mission_control_demo.sh +``` + +Export the current replay artifact: + +```bash +curl -fsS http://127.0.0.1:19802/api/mission-control/replay \ + -o mission-control-replay.json +``` + +Record a local macOS video artifact: + +```bash +./scripts/record_mission_control_demo.sh +``` + +The generated `.mov` is ignored by git and can be uploaded directly to the PR +discussion or hackathon submission. + +Open `/mission-control`, then: -## Screenshots +1. Click `Launch Mission`. +2. Watch the workflow progress through research, patching, checkpointing, and + test execution. +3. When the test fails, click `Fork From Checkpoint`. +4. Use the trace chips or failed/recovered run links to jump into Flight + Recorder deep links. +5. Watch the recovered run pass and complete review. -Flight Recorder overview: +Live mission state: -![NullHub Observability overview](docs/screenshots/nullhub-observability-overview.png) +![NullHub Mission Control live workflow](docs/screenshots/nullhub-mission-control-live.png) -Failure detail with tool-call error context: +Recovered mission: -![NullHub Observability failure detail](docs/screenshots/nullhub-observability-failure.png) +![NullHub Mission Control recovered workflow](docs/screenshots/nullhub-mission-control-recovered.png) ## Limitations And Future Improvements -- `NULLWATCH_URL` remains useful for pointing NullHub at an external NullWatch - instance, but the default demo path uses a managed NullWatch install. -- The first UI version renders a compact timeline, not a full waterfall chart. -- Run correlation with NullBoiler orchestration pages can be added as a follow-up - when both systems share stable run ids. +- The MVP uses deterministic demo state instead of real cross-service execution. +- The mission replay maps to NullTickets, NullBoiler, and NullWatch concepts, + but does not yet write into those services. +- A future version could add durable replay storage, side-by-side replay + comparison, exportable replay bundles, real NullWatch span hydration, and a + judge-mode one-click replay. diff --git a/MISSION_CONTROL_PLAN.md b/MISSION_CONTROL_PLAN.md new file mode 100644 index 0000000..de872b1 --- /dev/null +++ b/MISSION_CONTROL_PLAN.md @@ -0,0 +1,390 @@ +# NullOS Mission Control Plan + +## Product Goal + +Build the most memorable local-first AI-agent platform demo on top of the +nullclaw ecosystem: a three-minute control-room experience showing autonomous +work, live orchestration, failure, human intervention, replay/fork recovery, +and observability. + +This is a hackathon product slice, not a generic platform rewrite. + +## Demo Narrative + +1. Launch a mission from NullHub. +2. A task appears in the agent backlog. +3. Role-based agents move through research, code, test, and review. +4. Live telemetry updates: spans, evals, errors, tokens, cost. +5. A test step fails and the UI highlights the failing tool call. +6. The human forks from a checkpoint and injects a fix instruction. +7. The recovered run passes and the final screen compares failed vs recovered + execution. + +## MVP Scope + +- One NullHub page: `/mission-control`. +- One deterministic local mission scenario. +- A local mission API in NullHub that can: + - reset the mission + - launch the mission + - advance deterministic phases + - expose current mission state + - recover/fork the failed run +- Visual panels: + - mission status and current phase + - agent role board + - workflow graph + - event timeline + - telemetry strip + - failure/recovery panel +- No external services, secrets, or real model calls required for the MVP. + +## Production-Grade Hackathon Bar + +Mission Control is production-ready for the hackathon when it is a durable, +reviewable demo mode rather than a throwaway mock: + +- The API has a stable schema version, scenario identity, explicit demo-mode + metadata, and predictable action semantics. +- Invalid actions return actionable errors instead of mutating state silently. +- The UI is typed, resilient to API errors, responsive, and honest about the + deterministic local replay boundary. +- Tests cover the mission state machine, action routing, invalid transitions, + and API response shape. +- Documentation explains how to run, demo, validate, and extend the feature. +- The implementation leaves a clear path to real NullTickets, NullBoiler, and + NullWatch integration without pretending those services are already being + mutated by the demo. + +## One-Week Delivery Plan + +Day 1 - Harden the local Mission Control product slice. + +- Status: DONE +- Stable API schema and scenario metadata. +- Structured invalid-transition errors. +- Typed frontend contract. +- Adaptive polling, retry state, screenshots, and smoke test. + +Day 2 - Make replay data maintainable. + +- Status: DONE +- Moved scenario content into `src/api/mission_control/code_red.v1.json`. +- Added `src/api/mission_control_replay.zig` as the typed replay contract. +- Added fixture validation tests for schema version, duplicate ids, graph + references, telemetry references, ordering, required fields, and required + phases. + +Day 3 - Add observability affordances. + +- Status: DONE +- Link mission run ids and events to NullWatch-style trace/eval concepts. +- Add Flight Recorder deep links via `/observability?run_id=...`. +- Keep the UI useful without NullWatch running. + +Day 4 - Strengthen demo automation. + +- Status: DONE +- Add a judge-mode reset/launch/recover script or one-click replay action. +- Add a local presentation runbook and required local run-through before demo. +- Add a macOS video recording script for PR/hackathon review artifacts. +- Capture updated screenshots after the full flow. + +Day 5 - Add export/replay artifact. + +- Status: DONE +- Export current mission replay JSON for sharing and debugging. +- Document how the artifact maps to tasks, workflows, spans, evals, and + recovery. + +Day 6 - Polish the three-minute story. + +- Status: IN PROGRESS +- DONE: Added in-screen three-minute story beats so the demo has visible + presenter timing and narrative anchors. +- DONE: Added an explicit failed run vs recovered run comparison panel with + verdict, checkpoint, intervention, and trace links. +- Remaining: Test from a clean clone with only documented prerequisites. + +Day 7 - Stabilize for submission. + +- Status: DONE +- DONE: Run full validation. +- DONE: Freeze screenshots and demo script. +- DONE: Run the local demo end-to-end on the presentation machine. +- DONE: Record or refresh the Mission Control screenshot artifacts. +- DONE: Prepare PR title, PR description, reviewer path, validation matrix, and + hackathon narration in `docs/demo/mission-control-pr-package.md`. +- DONE: Record or refresh the optional `.mov` video artifact for upload outside + git. + +## Stretch Scope + +- Drive real NullTickets/NullBoiler/NullWatch APIs when configured. +- Side-by-side replay comparison. +- Animated graph edges and span waterfall. +- Judge mode: one button to reset and replay the full cinematic demo. +- Export mission replay bundle as JSON. + +## Iterations + +### Iteration 0 - Plan And Branch + +Status: DONE + +- Create a dedicated branch. +- Capture the plan in this file. +- Keep existing Flight Recorder PR work intact. + +### Iteration 1 - Mission State API + +Status: DONE + +- Add a small NullHub backend API under `/api/mission-control`. +- Use deterministic in-memory or file-backed demo state. +- Support: + - `GET /api/mission-control/state` + - `POST /api/mission-control/reset` + - `POST /api/mission-control/launch` + - `POST /api/mission-control/recover` +- Include enough structured state for UI: + - phases + - agents + - graph nodes/edges + - events + - telemetry + - failed run and recovered run summaries + +Definition of done: + +- Unit tests cover initial state, launch, phase advancement, reset, and recover. +- API routes are registered in NullHub without affecting existing routes. + +### Iteration 2 - Mission Control UI + +Status: DONE + +- Add `/mission-control`. +- Poll state every second while mission is active. +- Render: + - launch/recover/reset controls + - graph visualization + - role board + - mission timeline + - telemetry cards + - failure/recovery comparison + +Definition of done: + +- The page works without external services. +- A judge can understand the narrative by watching the screen. + +### Iteration 3 - Demo Flow Polish + +Status: DONE + +- Make the mission auto-progress after launch. +- Add clear failure moment. +- Add recovery moment after clicking recover. +- Ensure visual states are cinematic but still readable. + +Definition of done: + +- The whole demo can be completed in under three minutes. + +### Iteration 4 - Ecosystem Integration Hooks + +Status: DONE + +- Shape mission events so they can map to Observability runs later. +- Show NullWatch-style run ids. +- Preserve future path to real NullTickets/NullBoiler/NullWatch integration. + +Definition of done: + +- The MVP is honest about what is simulated and what maps to real ecosystem + components. + +### Iteration 5 - Validation And Demo Assets + +Status: DONE + +- Run Zig tests. +- Run Svelte build. +- Capture screenshots. +- Update README or hackathon submission notes. + +Definition of done: + +- Local validation commands pass or blockers are documented. +- Demo script is written and screenshots are committed. + +### Iteration 6 - Production Hardening + +Status: DONE + +- Add explicit API schema/demo metadata. +- Reject invalid transitions with structured errors. +- Type Mission Control frontend state instead of using `any`. +- Make polling adaptive and UI states clearer. +- Add tests for invalid actions and response shape. + +Definition of done: + +- Mission Control behaves predictably under repeated clicks, refreshes, + invalid actions, and API failures. +- Validation commands still pass after the hardening pass. + +### Iteration 7 - Week-Scale Platform Path + +Status: DONE + +- Replaced hardcoded scenario data with a versioned replay fixture. +- Added validated NullWatch-style trace refs to mission timeline events. +- Added Flight Recorder deep links that work as local affordances and can point + at real NullWatch runs when configured. +- Added a local smoke-test script for the full demo sequence. +- Kept optional mission replay JSON export as the Day 5 follow-up instead of + mixing artifact export into the observability slice. + +Definition of done: + +- The hackathon demo remains local-first while becoming progressively closer to + real cross-service orchestration. + +### Iteration 8 - Demo Automation And Recording + +Status: DONE + +- Added `scripts/mission_control_demo.sh` as a portable judge-mode driver. +- Added `scripts/record_mission_control_demo.sh` for local macOS video capture + via `screencapture`. +- Added `docs/demo/mission-control-local-demo.md` with the local run-through, + video recording steps, presenter script, and pre-demo quality gate. +- Kept generated `.mov` files ignored so large local review artifacts do not + pollute the source diff. + +Definition of done: + +- A reviewer can run the mission without manual timing. +- A presenter can record a local video artifact from the same deterministic + flow used by the smoke test. + +### Iteration 9 - Replay Artifact Export + +Status: DONE + +- Added `GET /api/mission-control/replay` as a read-only export endpoint. +- Export includes the current snapshot, source replay fixture, fixture path, + schema identity, and ecosystem mapping metadata. +- Added `Export Replay` in the Mission Control UI. +- Added smoke/demo validation for the replay artifact. +- Added `docs/demo/mission-control-replay-artifact.md`. + +Definition of done: + +- A reviewer can export a single JSON file that explains the current mission + state and how the local replay maps to NullTickets, NullBoiler, NullClaw, and + NullWatch concepts. + +### Iteration 10 - Three-Minute Story Polish + +Status: DONE + +- Added a compact story strip to `/mission-control` with six timed beats: + launch, checkpoint, failure, intervention, replay, and review. +- Added a `Failure Recovery` comparison panel that makes the failed run, + recovered run, checkpoint, human instruction, verdict transition, and + observability links visible without presenter narration. +- Kept the change frontend-only because the existing mission state API already + exposes the required evidence. + +Definition of done: + +- A judge can understand the failure/recovery arc by reading the screen during + the live replay. +- `npm --prefix ui run build` passes after the polish change. + +### Iteration 11 - PR Package And Submission Notes + +Status: DONE + +- Added `docs/demo/mission-control-pr-package.md` with a copy-ready PR title, + PR description, reviewer path, three-minute story, validation matrix, video + artifact instructions, scope boundaries, and future work. +- Linked the PR package from the README demo section and project tree. +- Updated Day 7 status so the remaining final-submission task is explicit: + upload the ignored local `.mov` artifact if the PR or hackathon submission + needs an attached video. + +Definition of done: + +- A reviewer can understand what to run, what changed, why it matters, and what + validation was performed from one file. +- The PR package is separate from the broader hackathon notes so it can be + pasted into GitHub without editing unrelated documentation. + +### Iteration 12 - Final Local Demo Recording + +Status: DONE + +- Ran the final local validation matrix. +- Started NullHub on `127.0.0.1:19802`. +- Ran the live smoke test, judge-mode demo driver, replay export check, and + macOS recording script. +- Refreshed the ignored local video artifact at + `docs/demo/nullhub-mission-control-demo.mov`. + +Definition of done: + +- The demo can be run and recorded locally from the documented commands. +- The video artifact is available for manual upload but remains excluded from + the source diff. + +## Three-Minute Script + +0:00 - Open `/mission-control`; click `Launch Mission`. + +0:30 - Research and coding phases light up. Timeline records task claim, model +planning, code patch, and checkpoint creation. + +1:00 - Test phase fails. The graph marks `test` red, telemetry increments errors, +and the failure panel shows `zig build test exited with status 1`. + +1:30 - Click `Fork From Checkpoint`. The UI shows the human instruction: +`apply missing validation guard`. + +2:00 - Recovered run replays from checkpoint and passes tests. + +2:30 - Review phase passes. Final comparison shows failed run vs recovered run, +cost, duration, and eval verdict. + +## Technical Shape + +```mermaid +flowchart LR + H["NullHub /mission-control"] --> A["/api/mission-control/state"] + H --> C["/api/mission-control/actions"] + C --> S["Mission demo state"] + S --> T["NullTickets-like tasks/events"] + S --> B["NullBoiler-like workflow/checkpoints"] + S --> W["NullWatch-like spans/evals"] +``` + +## Risks + +- Real cross-service orchestration can consume the week. Control this by making + the MVP deterministic first and adding integration hooks later. +- Visual polish can expand without limit. Keep one page and one scenario. +- If backend state becomes complicated, switch to a static replay bundle with + action-controlled phase transitions. + +## Fallback + +If Mission Control slips, ship a focused Time-Travel Debugger: + +- failed run +- checkpoint +- forked recovered run +- state/span diff +- screenshots and demo script diff --git a/README.md b/README.md index f01ef27..db1c387 100644 --- a/README.md +++ b/README.md @@ -23,6 +23,7 @@ NullTickets, NullWatch). - **Managed instance admin API** -- instance-scoped status, config, models, cron, channels, and skills routes for managed NullClaw installs - **Orchestration UI** -- workflow editor, poll-based run monitoring, checkpoint forking, encoded workflow/run/store links, and key-value store browser (proxied to NullTickets through NullHub) - **Observability cockpit** -- local NullWatch run summaries, span timelines, eval results, token usage, cost, and error context through a NullHub proxy +- **Mission Control demo** -- local-first agent mission replay with orchestration, role-based agents, failure, checkpoint recovery, and live telemetry in one screen ## Quick Start @@ -151,6 +152,83 @@ Local NullWatch setup: -d '{"run_id":"demo-run-1","eval_key":"tool_success","scorer":"deterministic","score":0.0,"verdict":"fail","dataset":"demo","notes":"The tool call timed out."}' ``` +**Mission Control API** -- requests to `/api/mission-control/*` drive a +deterministic local demo scenario for the `/mission-control` page. The demo +does not require hosted infrastructure or model secrets; it shows how +NullTickets-style tasks, NullBoiler-style workflow checkpoints, and +NullWatch-style traces can fit into one operator view. Responses include a +schema version, scenario id, deterministic replay mode, controls, graph, +timeline, telemetry, NullWatch-style run/span/eval trace references, and +structured conflict errors for invalid actions. The scenario lives in a +versioned embedded replay fixture at +`src/api/mission_control/code_red.v1.json`; `zig build test` validates fixture +schema, references, ordering, required phases, graph links, and telemetry phase +coverage. Mission timeline trace links deep-link to `/observability?run_id=...` +so a real NullWatch instance can attach detailed spans and evals without making +the local demo depend on external infrastructure. `GET /api/mission-control/replay` +exports the current snapshot, source fixture, and ecosystem mapping metadata as +a portable JSON artifact for debugging and review. + +### Mission Control Demo + +Start NullHub locally and open `/mission-control`: + +```bash +zig build run -- serve --host 127.0.0.1 --port 19802 --no-open +``` + +The page provides `Reset`, `Launch Mission`, and `Fork From Checkpoint` +controls. Launching the mission advances a deterministic agent workflow through +research, patching, checkpointing, test failure, human intervention, recovered +test pass, and review completion. Timeline events include trace chips that map +the cinematic replay back to local NullWatch-style run ids, span ids, operations, +and eval keys. The page also includes timed story beats and a failed-vs-recovered +comparison panel so the three-minute demo can be followed directly from the +screen. + +Run the judge-mode demo driver against a started server: + +```bash +MISSION_CONTROL_OPEN_BROWSER=1 ./scripts/mission_control_demo.sh +``` + +Record a local macOS video artifact for PR or hackathon review: + +```bash +./scripts/record_mission_control_demo.sh +``` + +The generated video defaults to `docs/demo/nullhub-mission-control-demo.mov` and +is ignored by git so it can be uploaded directly to the PR discussion or +hackathon submission. See `docs/demo/mission-control-local-demo.md` for the +full presenter runbook and `docs/demo/mission-control-pr-package.md` for the +copy-ready PR title, description, validation matrix, and reviewer path. + +Export the current replay artifact: + +```bash +curl -fsS http://127.0.0.1:19802/api/mission-control/replay \ + -o mission-control-replay.json +``` + +The same export is available from the `Export Replay` button in Mission Control. +See `docs/demo/mission-control-replay-artifact.md` for the artifact shape and +ecosystem mapping. + +Run the live API smoke test against a started server: + +```bash +NULLHUB_URL=http://127.0.0.1:19802 ./tests/test_mission_control_smoke.sh +``` + +Live mission state: + +![NullHub Mission Control live workflow](docs/screenshots/nullhub-mission-control-live.png) + +Recovered mission: + +![NullHub Mission Control recovered workflow](docs/screenshots/nullhub-mission-control-recovered.png) + ### Observability Screenshots Flight Recorder overview: @@ -182,6 +260,8 @@ End-to-end: ```bash ./tests/test_e2e.sh +NULLHUB_URL=http://127.0.0.1:19802 ./tests/test_mission_control_smoke.sh +MISSION_CONTROL_OPEN_BROWSER=1 ./scripts/mission_control_demo.sh ``` `zig build test-integration` runs structured backend HTTP integration tests @@ -206,6 +286,9 @@ src/ api/ # REST endpoints (components, instances, wizard, ...) orchestration.zig # Reverse proxy to NullBoiler orchestration API observability.zig # Reverse proxy to NullWatch tracing/eval API + mission_control.zig # Local deterministic agent mission demo API + mission_control_replay.zig # Typed replay fixture parser and validator + mission_control/ # Embedded Mission Control replay fixtures core/ # Manifest parser, state, platform, paths installer/ # Download, build, UI module fetching supervisor/ # Process spawn, health checks, manager @@ -213,10 +296,18 @@ ui/src/ routes/ # SvelteKit pages orchestration/ # Orchestration pages (dashboard, workflows, runs, store) observability/ # NullWatch flight recorder page + mission-control/ # Local agent mission control room demo lib/components/ # Reusable Svelte components orchestration/ # GraphViewer, StateInspector, RunEventLog, InterruptPanel, # CheckpointTimeline, WorkflowJsonEditor, NodeCard, SendProgressBar lib/api/ # Typed API client tests/ test_e2e.sh # End-to-end test script +scripts/ + mission_control_demo.sh # Judge-mode local demo driver + record_mission_control_demo.sh # macOS local video recorder for the demo +docs/demo/ + mission-control-local-demo.md # Presenter runbook and recording instructions + mission-control-replay-artifact.md # Replay JSON artifact schema and mapping + mission-control-pr-package.md # Copy-ready PR body and reviewer checklist ``` diff --git a/docs/demo/.gitignore b/docs/demo/.gitignore new file mode 100644 index 0000000..4b9b9f0 --- /dev/null +++ b/docs/demo/.gitignore @@ -0,0 +1,3 @@ +*.mov +*.mp4 +*.webm diff --git a/docs/demo/mission-control-local-demo.md b/docs/demo/mission-control-local-demo.md new file mode 100644 index 0000000..fd8e80f --- /dev/null +++ b/docs/demo/mission-control-local-demo.md @@ -0,0 +1,86 @@ +# Mission Control Local Demo + +This is the local runbook for a live hackathon presentation and video capture. +It assumes NullHub is running locally and does not require hosted services, +model keys, or external infrastructure. + +## Start NullHub + +```bash +zig build run -- serve --host 127.0.0.1 --port 19802 --no-open +``` + +Open: + +```text +http://127.0.0.1:19802/mission-control +``` + +## Run The Judge-Mode Demo + +In a second terminal: + +```bash +MISSION_CONTROL_OPEN_BROWSER=1 ./scripts/mission_control_demo.sh +``` + +The script resets the mission, launches the local replay, waits for the +validation failure, forks from the checkpoint, waits for recovered completion, +verifies that the failed and recovered timeline events carry trace refs, and +checks that `/api/mission-control/replay` exports a completed artifact. + +## Export A Replay Artifact + +The current Mission Control state can be exported as JSON: + +```bash +curl -fsS http://127.0.0.1:19802/api/mission-control/replay \ + -o mission-control-replay.json +``` + +The same export is available from the `Export Replay` button in the UI. The +artifact contains the current snapshot, the source fixture, and the ecosystem +mapping used to explain the local replay. + +## Record A Local Video + +On macOS: + +```bash +./scripts/record_mission_control_demo.sh +``` + +The script opens `/mission-control`, records the screen with +`screencapture`, and drives the mission automatically. The default output is: + +```text +docs/demo/nullhub-mission-control-demo.mov +``` + +The video file is intentionally ignored by git because it is a local review +artifact. Upload it directly to the hackathon submission or PR discussion. + +If macOS asks for Screen Recording permission, allow it in System Settings and +rerun the command. + +## Presenter Script + +1. Show that the demo is local-first: one NullHub server, no external services. +2. Launch the mission and call out the role board, workflow graph, and telemetry. +3. Pause at the failure: the test tool fails, errors increment, and recovery is + blocked until the failure phase. +4. Click or let the script trigger checkpoint recovery. +5. Show the recovered run, passing eval verdict, and trace links into Flight + Recorder via `/observability?run_id=...`. +6. Export the replay artifact to show the scenario can be reviewed after the + live demo. + +## Pre-Demo Quality Gate + +```bash +zig build test -Dembed-ui=false --summary all +npm --prefix ui run build +zig build test --summary all +NULLHUB_URL=http://127.0.0.1:19802 ./tests/test_mission_control_smoke.sh +MISSION_CONTROL_OPEN_BROWSER=1 ./scripts/mission_control_demo.sh +``` diff --git a/docs/demo/mission-control-pr-package.md b/docs/demo/mission-control-pr-package.md new file mode 100644 index 0000000..e4caba6 --- /dev/null +++ b/docs/demo/mission-control-pr-package.md @@ -0,0 +1,180 @@ +# Mission Control PR Package + +This file is the copy-ready review package for the NullOS Mission Control +hackathon contribution. + +## Suggested PR Title + +Add NullOS Mission Control local agent recovery demo + +## Suggested PR Description + +Adds a local-first Mission Control demo to NullHub: a three-minute control-room +experience for lightweight agent infrastructure. + +The demo shows a deterministic agent mission from launch to failure, human +checkpoint recovery, recovered validation, review, trace links, telemetry, and +replay export. It is designed to run locally without hosted services, model +keys, or external infrastructure. + +What changed: + +- Added `/api/mission-control/*` for mission state, reset, launch, recovery, + and replay export. +- Added a versioned replay fixture at + `src/api/mission_control/code_red.v1.json`. +- Added replay fixture parsing and validation in + `src/api/mission_control_replay.zig`. +- Added `/mission-control` UI with mission controls, role board, workflow + graph, telemetry, timeline, trace links, story beats, and failed-vs-recovered + comparison. +- Added deep links from mission events to `/observability?run_id=...`. +- Added local smoke test, judge-mode demo driver, macOS video recorder, + screenshots, README docs, and hackathon submission notes. + +Why: + +NullHub already acts as the control plane for the nullclaw ecosystem, and the +surrounding repositories already sketch out runtime, orchestration, task state, +and observability. What was missing was a memorable local vertical slice that +lets reviewers see those concepts working as one operator experience. + +This PR keeps the demo deterministic and honest: it does not pretend to mutate +real NullTickets, NullBoiler, NullClaw, or NullWatch services. Instead it +provides a stable local replay with explicit ecosystem mapping and a future path +for real service hydration. + +Validation performed: + +```bash +zig build test -Dembed-ui=false --summary all +npm --prefix ui run build +zig build test --summary all +NULLHUB_URL=http://127.0.0.1:19802 ./tests/test_mission_control_smoke.sh +MISSION_CONTROL_OPEN_BROWSER=0 ./scripts/mission_control_demo.sh +git diff --check +``` + +Demo: + +```bash +zig build run -- serve --host 127.0.0.1 --port 19802 --no-open +MISSION_CONTROL_OPEN_BROWSER=1 ./scripts/mission_control_demo.sh +``` + +Open: + +```text +http://127.0.0.1:19802/mission-control +``` + +Screenshots: + +- `docs/screenshots/nullhub-mission-control-live.png` +- `docs/screenshots/nullhub-mission-control-recovered.png` + +## Reviewer Path + +1. Start NullHub: + + ```bash + zig build run -- serve --host 127.0.0.1 --port 19802 --no-open + ``` + +2. Open the UI: + + ```text + http://127.0.0.1:19802/mission-control + ``` + +3. Run the automated demo in another terminal: + + ```bash + MISSION_CONTROL_OPEN_BROWSER=1 ./scripts/mission_control_demo.sh + ``` + +4. Watch the page move through: + + - launch + - research + - patching + - checkpoint + - test failure + - human fork from checkpoint + - recovered validation + - review complete + +5. Open a trace link or export the replay artifact: + + ```bash + curl -fsS http://127.0.0.1:19802/api/mission-control/replay \ + -o mission-control-replay.json + ``` + +## Three-Minute Hackathon Story + +0:00 - Launch the mission from NullHub. + +0:30 - Agents light up on the role board and workflow graph. + +1:00 - Tests fail. The graph marks the tool step red, telemetry increments +errors, and the timeline points at the failed NullWatch-style eval. + +1:30 - The operator forks from the checkpoint with the instruction +`apply missing validation guard`. + +2:00 - The recovered run replays validation and passes. + +2:30 - The final screen compares failed and recovered runs, with trace links and +exportable replay evidence. + +## Latest Local Validation + +Last run: 2026-05-10 + +| Command | Result | +| --- | --- | +| `npm --prefix ui run build` | pass | +| `zig build test -Dembed-ui=false --summary all` | pass | +| `zig build test --summary all` | pass | +| `NULLHUB_URL=http://127.0.0.1:19802 ./tests/test_mission_control_smoke.sh` | pass | +| `MISSION_CONTROL_OPEN_BROWSER=0 ./scripts/mission_control_demo.sh` | pass | +| `git diff --check` | pass | +| `./scripts/record_mission_control_demo.sh` | pass | + +## Video Artifact + +On macOS: + +```bash +./scripts/record_mission_control_demo.sh +``` + +The generated video defaults to: + +```text +docs/demo/nullhub-mission-control-demo.mov +``` + +The video is ignored by git and can be uploaded to PR discussion or the +hackathon submission. + +Latest local recording: 2026-05-10, `36M`. + +## Scope Boundaries + +This PR intentionally does not: + +- run real model calls; +- require hosted infrastructure; +- require NullTickets, NullBoiler, NullClaw, or NullWatch to be running; +- mutate real task or workflow state; +- replace the existing observability page. + +## Future Work + +- Hydrate replay trace panels from a running NullWatch instance when available. +- Connect real NullBoiler workflow run ids and checkpoint metadata. +- Compare failed and recovered replay artifacts side by side. +- Add durable mission replay storage. +- Add a one-click judge replay button in the UI. diff --git a/docs/demo/mission-control-replay-artifact.md b/docs/demo/mission-control-replay-artifact.md new file mode 100644 index 0000000..4cf6e78 --- /dev/null +++ b/docs/demo/mission-control-replay-artifact.md @@ -0,0 +1,75 @@ +# Mission Control Replay Artifact + +Mission Control exposes the current deterministic replay as JSON: + +```text +GET /api/mission-control/replay +``` + +The artifact is intended for local debugging, PR review, and hackathon +submission evidence. It does not mutate runtime state and does not require +NullTickets, NullBoiler, NullClaw, or NullWatch to be running. + +## Shape + +The exported JSON contains: + +- `artifact_schema_version` - version of the export wrapper. +- `artifact_kind` - `nullhub.mission_control.replay`. +- `generated_at_ms` - export timestamp. +- `replay_fixture_path` - repository path of the embedded scenario fixture. +- `scenario_id`, `scenario_version`, `mode` - replay identity. +- `snapshot` - the current rendered Mission Control state. +- `replay_fixture` - the source fixture used to derive the replay. +- `ecosystem_mapping` - how the fixture maps to nullclaw ecosystem concepts. + +## Ecosystem Mapping + +`ecosystem_mapping.nulltickets` points to tracker-style evidence: + +- `events[source=nulltickets]` +- `graph.nodes[kind=tracker]` + +`ecosystem_mapping.nullboiler` points to orchestration evidence: + +- phase timing and workflow graph edges +- `checkpoint_id` +- failed and recovered run ids +- human fork instruction + +`ecosystem_mapping.nullclaw` points to agent evidence: + +- role-based agents +- agent graph nodes +- NullClaw-style event source entries + +`ecosystem_mapping.nullwatch` points to observability evidence: + +- failed and recovered run ids +- `events[].trace` +- telemetry counters +- failure and recovery run panels + +## Local Export + +Start NullHub: + +```bash +zig build run -- serve --host 127.0.0.1 --port 19802 --no-open +``` + +Export the current replay: + +```bash +curl -fsS http://127.0.0.1:19802/api/mission-control/replay \ + -o mission-control-replay.json +``` + +Or use the UI button: + +```text +/mission-control -> Export Replay +``` + +The exported JSON can be attached to PR discussion or used as a compact record +of the local demo state at the moment it was captured. diff --git a/docs/screenshots/nullhub-mission-control-live.png b/docs/screenshots/nullhub-mission-control-live.png new file mode 100644 index 0000000..160ee83 Binary files /dev/null and b/docs/screenshots/nullhub-mission-control-live.png differ diff --git a/docs/screenshots/nullhub-mission-control-recovered.png b/docs/screenshots/nullhub-mission-control-recovered.png new file mode 100644 index 0000000..059d5b0 Binary files /dev/null and b/docs/screenshots/nullhub-mission-control-recovered.png differ diff --git a/scripts/mission_control_demo.sh b/scripts/mission_control_demo.sh new file mode 100755 index 0000000..0ecc11d --- /dev/null +++ b/scripts/mission_control_demo.sh @@ -0,0 +1,148 @@ +#!/usr/bin/env bash +set -euo pipefail + +BASE_URL="${NULLHUB_URL:-http://127.0.0.1:19802}" +OPEN_BROWSER="${MISSION_CONTROL_OPEN_BROWSER:-0}" +PREROLL_MS="${MISSION_CONTROL_PREROLL_MS:-1200}" +FAILURE_HOLD_MS="${MISSION_CONTROL_FAILURE_HOLD_MS:-1800}" +COMPLETION_HOLD_MS="${MISSION_CONTROL_COMPLETION_HOLD_MS:-1200}" +POLL_MS="${MISSION_CONTROL_POLL_MS:-500}" +TIMEOUT_MS="${MISSION_CONTROL_TIMEOUT_MS:-45000}" + +node - "$BASE_URL" "$OPEN_BROWSER" "$PREROLL_MS" "$FAILURE_HOLD_MS" "$COMPLETION_HOLD_MS" "$POLL_MS" "$TIMEOUT_MS" <<'NODE' +const { spawn } = await import('node:child_process'); + +const base = process.argv[2].replace(/\/$/, ''); +const openBrowser = process.argv[3] === '1'; +const prerollMs = Number(process.argv[4]); +const failureHoldMs = Number(process.argv[5]); +const completionHoldMs = Number(process.argv[6]); +const pollMs = Number(process.argv[7]); +const timeoutMs = Number(process.argv[8]); +const missionUrl = `${base}/mission-control`; + +function sleep(ms) { + return new Promise((resolve) => setTimeout(resolve, ms)); +} + +function assert(condition, message) { + if (!condition) throw new Error(message); +} + +async function api(path, method = 'GET') { + let res; + try { + res = await fetch(`${base}${path}`, { method }); + } catch (error) { + throw new Error(`Cannot reach NullHub at ${base}: ${error.message}`); + } + + const text = await res.text(); + const body = text ? JSON.parse(text) : null; + return { status: res.status, body }; +} + +function openMissionPage() { + const command = + process.platform === 'darwin' + ? 'open' + : process.platform === 'win32' + ? 'cmd' + : 'xdg-open'; + const args = process.platform === 'win32' ? ['/c', 'start', '', missionUrl] : [missionUrl]; + const child = spawn(command, args, { detached: true, stdio: 'ignore' }); + child.unref(); +} + +function formatState(state) { + return [ + `phase=${state.phase}`, + `status=${state.status}`, + `progress=${state.progress}%`, + `run=${state.active_run_id || '-'}`, + `spans=${state.telemetry?.spans ?? 0}`, + `evals=${state.telemetry?.evals ?? 0}`, + `verdict=${state.telemetry?.verdict || '-'}`, + ].join(' '); +} + +function printStep(label, state) { + console.log(`${label.padEnd(12)} ${formatState(state)}`); +} + +async function expectOk(path, method) { + const response = await api(path, method); + assert(response.status === 200, `${method} ${path} returned HTTP ${response.status}`); + return response.body; +} + +async function waitFor(label, predicate) { + const started = Date.now(); + let lastPhase = ''; + let lastState = null; + + while (Date.now() - started < timeoutMs) { + const response = await api('/api/mission-control/state'); + assert(response.status === 200, `state returned HTTP ${response.status}`); + lastState = response.body; + + if (lastState.phase !== lastPhase) { + printStep(label, lastState); + lastPhase = lastState.phase; + } + + if (predicate(lastState)) return lastState; + await sleep(pollMs); + } + + throw new Error(`Timed out waiting for ${label}. Last state: ${lastState ? formatState(lastState) : 'none'}`); +} + +console.log('NullOS Mission Control judge demo'); +console.log(`Base URL: ${base}`); +console.log(`Open UI: ${missionUrl}`); + +let state = await expectOk('/api/mission-control/reset', 'POST'); +assert(state.schema_version === 1, 'unexpected mission schema version'); +assert(state.mode === 'deterministic_local_replay', 'unexpected mission mode'); +printStep('reset', state); + +if (openBrowser) { + openMissionPage(); + console.log('browser opened mission-control page'); +} + +await sleep(prerollMs); + +state = await expectOk('/api/mission-control/launch', 'POST'); +printStep('launch', state); + +state = await waitFor('primary', (candidate) => candidate.status === 'intervention_required' && candidate.controls?.can_recover === true); +const failedEvent = state.events?.find((event) => event.title === 'Validation failed'); +assert(failedEvent?.trace?.run_id === 'run-demo-failed-test', 'missing failed run trace reference'); +assert(failedEvent?.trace?.eval_key === 'tool_success', 'missing failed eval trace reference'); +console.log('failure human intervention point reached'); + +await sleep(failureHoldMs); + +state = await expectOk('/api/mission-control/recover', 'POST'); +assert(state.recovered_run_id === 'run-demo-recovered-fork', 'missing recovered run id'); +printStep('recover', state); + +state = await waitFor('recovery', (candidate) => candidate.status === 'completed' && candidate.telemetry?.verdict === 'pass'); +const recoveredEvent = state.events?.find((event) => event.title === 'Recovered tests passed'); +assert(recoveredEvent?.trace?.run_id === 'run-demo-recovered-fork', 'missing recovered run trace reference'); + +await sleep(completionHoldMs); + +const artifactResponse = await api('/api/mission-control/replay'); +assert(artifactResponse.status === 200, `replay export returned HTTP ${artifactResponse.status}`); +assert(artifactResponse.body?.artifact_kind === 'nullhub.mission_control.replay', 'unexpected replay artifact kind'); +assert(artifactResponse.body?.snapshot?.status === 'completed', 'replay export did not capture completed snapshot'); + +console.log('completed recovered mission passed'); +console.log(`failed run: ${state.failed_run_id}`); +console.log(`recovered: ${state.recovered_run_id}`); +console.log(`trace link: ${base}/observability?run_id=${encodeURIComponent(state.recovered_run_id)}`); +console.log(`export: ${base}/api/mission-control/replay`); +NODE diff --git a/scripts/record_mission_control_demo.sh b/scripts/record_mission_control_demo.sh new file mode 100755 index 0000000..22b81ed --- /dev/null +++ b/scripts/record_mission_control_demo.sh @@ -0,0 +1,69 @@ +#!/usr/bin/env bash +set -euo pipefail + +BASE_URL="${NULLHUB_URL:-http://127.0.0.1:19802}" +OUTPUT="${MISSION_CONTROL_VIDEO_OUT:-docs/demo/nullhub-mission-control-demo.mov}" +RECORD_SECONDS="${MISSION_CONTROL_RECORD_SECONDS:-36}" +DEMO_SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +REPO_ROOT="$(cd "$DEMO_SCRIPT_DIR/.." && pwd)" +OUTPUT_ABS="$REPO_ROOT/$OUTPUT" + +if [[ "$(uname -s)" != "Darwin" ]]; then + echo "Mission Control video recording currently uses macOS screencapture." >&2 + echo "Run scripts/mission_control_demo.sh for the portable live demo driver." >&2 + exit 2 +fi + +if ! command -v screencapture >/dev/null 2>&1; then + echo "screencapture is required for local video recording on macOS." >&2 + exit 2 +fi + +mkdir -p "$(dirname "$OUTPUT_ABS")" +rm -f "$OUTPUT_ABS" + +node - "$BASE_URL" <<'NODE' +const base = process.argv[2].replace(/\/$/, ''); +try { + const res = await fetch(`${base}/api/mission-control/state`); + if (!res.ok) throw new Error(`HTTP ${res.status}`); +} catch (error) { + console.error(`Cannot reach NullHub at ${base}: ${error.message}`); + process.exit(1); +} +NODE + +echo "Recording Mission Control demo to $OUTPUT_ABS" +echo "Open UI: $BASE_URL/mission-control" +echo "If macOS asks for Screen Recording permission, allow it and rerun this script." + +open "$BASE_URL/mission-control" >/dev/null 2>&1 || true +sleep 1 + +screencapture -v -V "$RECORD_SECONDS" -k "$OUTPUT_ABS" & +RECORDER_PID=$! + +cleanup() { + if kill -0 "$RECORDER_PID" >/dev/null 2>&1; then + kill "$RECORDER_PID" >/dev/null 2>&1 || true + fi +} +trap cleanup INT TERM + +MISSION_CONTROL_OPEN_BROWSER=0 \ +MISSION_CONTROL_PREROLL_MS=2000 \ +MISSION_CONTROL_FAILURE_HOLD_MS=3200 \ +MISSION_CONTROL_COMPLETION_HOLD_MS=4000 \ +MISSION_CONTROL_TIMEOUT_MS=50000 \ +"$DEMO_SCRIPT_DIR/mission_control_demo.sh" + +wait "$RECORDER_PID" +trap - INT TERM + +if [[ ! -s "$OUTPUT_ABS" ]]; then + echo "Recording did not produce a video file. Check macOS Screen Recording permission and rerun." >&2 + exit 1 +fi + +ls -lh "$OUTPUT_ABS" +echo "Video ready: $OUTPUT_ABS" diff --git a/src/api/meta.zig b/src/api/meta.zig index 0313e21..a554199 100644 --- a/src/api/meta.zig +++ b/src/api/meta.zig @@ -1343,6 +1343,54 @@ const routes = [_]RouteSpec{ .body = "Forwarded as-is to NullWatch.", .response = "Forwarded upstream JSON response.", }, + .{ + .id = "mission-control.state", + .method = "GET", + .path_template = "/api/mission-control/state", + .category = "mission-control", + .summary = "Read the local deterministic NullOS Mission Control replay state.", + .auth_mode = "optional_bearer", + .response = "Schema-versioned mission state, controls, graph, timeline, telemetry, recovery metadata, and NullWatch-style trace references.", + }, + .{ + .id = "mission-control.replay", + .method = "GET", + .path_template = "/api/mission-control/replay", + .category = "mission-control", + .summary = "Export the current Mission Control replay artifact for local review and debugging.", + .auth_mode = "optional_bearer", + .response = "Replay artifact with current snapshot, source fixture, and NullTickets/NullBoiler/NullClaw/NullWatch mapping metadata.", + }, + .{ + .id = "mission-control.reset", + .method = "POST", + .path_template = "/api/mission-control/reset", + .category = "mission-control", + .summary = "Reset the local Mission Control replay to idle.", + .auth_mode = "optional_bearer", + .body = "No request body required.", + .response = "Reset mission state.", + }, + .{ + .id = "mission-control.launch", + .method = "POST", + .path_template = "/api/mission-control/launch", + .category = "mission-control", + .summary = "Launch the local Mission Control replay if it is idle.", + .auth_mode = "optional_bearer", + .body = "No request body required.", + .response = "Mission state after launch, or 409 when already started.", + }, + .{ + .id = "mission-control.recover", + .method = "POST", + .path_template = "/api/mission-control/recover", + .category = "mission-control", + .summary = "Fork the local Mission Control replay after the failure phase.", + .auth_mode = "optional_bearer", + .body = "No request body required.", + .response = "Mission state after checkpoint recovery, or 409 before recovery is allowed.", + }, }; pub fn allRoutes() []const RouteSpec { diff --git a/src/api/mission_control.zig b/src/api/mission_control.zig new file mode 100644 index 0000000..bf45c41 --- /dev/null +++ b/src/api/mission_control.zig @@ -0,0 +1,704 @@ +const std = @import("std"); +const std_compat = @import("compat"); +const helpers = @import("helpers.zig"); +const replay = @import("mission_control_replay.zig"); + +const ApiResponse = helpers.ApiResponse; + +const prefix = "/api/mission-control"; + +const RuntimeState = struct { + launched: bool = false, + started_at_ms: i64 = 0, + recovered: bool = false, + recovery_started_at_ms: i64 = 0, +}; + +const MissionControls = struct { + can_launch: bool, + can_recover: bool, + can_reset: bool, +}; + +const Agent = struct { + id: []const u8, + role: []const u8, + status: []const u8, + current_step: []const u8, +}; + +const GraphNode = struct { + id: []const u8, + label: []const u8, + kind: []const u8, + status: []const u8, +}; + +const GraphEdge = struct { + from: []const u8, + to: []const u8, + status: []const u8, +}; + +const MissionGraph = struct { + nodes: []const GraphNode, + edges: []const GraphEdge, +}; + +const MissionEvent = struct { + at_ms: i64, + source: []const u8, + level: []const u8, + title: []const u8, + detail: []const u8, + status: []const u8, + trace: ?replay.EventTraceDef, +}; + +const MissionTelemetry = struct { + runs: usize, + spans: usize, + evals: usize, + errors: usize, + total_tokens: usize, + total_cost_usd: f64, + verdict: []const u8, +}; + +const FailurePanel = struct { + run_id: []const u8, + checkpoint_id: []const u8, + failed_step: []const u8, + error_message: []const u8, + suggested_intervention: []const u8, +}; + +const RecoveryPanel = struct { + run_id: []const u8, + forked_from: []const u8, + human_instruction: []const u8, + status: []const u8, +}; + +const MissionSnapshot = struct { + schema_version: u8, + mode: []const u8, + scenario_id: []const u8, + scenario_version: []const u8, + generated_at_ms: i64, + mission_id: []const u8, + title: []const u8, + status: []const u8, + phase: []const u8, + headline: []const u8, + elapsed_ms: i64, + progress: u8, + active_run_id: ?[]const u8, + failed_run_id: ?[]const u8, + recovered_run_id: ?[]const u8, + controls: MissionControls, + agents: []const Agent, + graph: MissionGraph, + events: []const MissionEvent, + telemetry: MissionTelemetry, + failure: ?FailurePanel, + recovery: ?RecoveryPanel, +}; + +const ComponentMapping = struct { + component: []const u8, + role: []const u8, + evidence: []const []const u8, +}; + +const WorkflowMapping = struct { + component: []const u8, + role: []const u8, + checkpoint_id: []const u8, + failed_run_id: []const u8, + recovered_run_id: []const u8, + human_instruction: []const u8, + evidence: []const []const u8, +}; + +const ObservabilityMapping = struct { + component: []const u8, + role: []const u8, + failed_run_id: []const u8, + recovered_run_id: []const u8, + trace_ref_source: []const u8, + evidence: []const []const u8, +}; + +const ReplayArtifactMapping = struct { + nulltickets: ComponentMapping, + nullboiler: WorkflowMapping, + nullclaw: ComponentMapping, + nullwatch: ObservabilityMapping, +}; + +const ReplayArtifact = struct { + artifact_schema_version: u8, + artifact_kind: []const u8, + generated_at_ms: i64, + replay_fixture_path: []const u8, + scenario_id: []const u8, + scenario_version: []const u8, + mode: []const u8, + snapshot: MissionSnapshot, + replay_fixture: replay.ReplayFixture, + ecosystem_mapping: ReplayArtifactMapping, +}; + +var mission_mutex: std_compat.sync.Mutex = .{}; +var mission_runtime = RuntimeState{}; + +pub fn isPath(target: []const u8) bool { + const path = stripQuery(target); + return std.mem.eql(u8, path, prefix) or std.mem.startsWith(u8, path, prefix ++ "/"); +} + +pub fn handle(allocator: std.mem.Allocator, method: []const u8, target: []const u8) ApiResponse { + const path = stripQuery(target); + if (!isPath(path)) return helpers.notFound(); + + const is_state = std.mem.eql(u8, path, prefix ++ "/state"); + const is_replay = std.mem.eql(u8, path, prefix ++ "/replay"); + const is_reset = std.mem.eql(u8, path, prefix ++ "/reset"); + const is_launch = std.mem.eql(u8, path, prefix ++ "/launch"); + const is_recover = std.mem.eql(u8, path, prefix ++ "/recover"); + if (!is_state and !is_replay and !is_reset and !is_launch and !is_recover) return helpers.notFound(); + + if (is_state or is_replay) { + if (!std.mem.eql(u8, method, "GET")) return helpers.methodNotAllowed(); + } else if (!std.mem.eql(u8, method, "POST")) { + return helpers.methodNotAllowed(); + } + + mission_mutex.lock(); + defer mission_mutex.unlock(); + + const now_ms = std_compat.time.milliTimestamp(); + + if (is_state) { + const body = buildStateJson(allocator, mission_runtime, now_ms) catch return helpers.serverError(); + return helpers.jsonOk(body); + } + + if (is_replay) { + const body = buildReplayArtifactJson(allocator, mission_runtime, now_ms) catch return helpers.serverError(); + return helpers.jsonOk(body); + } + + if (is_reset) { + mission_runtime = .{}; + const body = buildStateJson(allocator, mission_runtime, now_ms) catch return helpers.serverError(); + return helpers.jsonOk(body); + } + + var parsed = replay.parseValidated(allocator) catch return helpers.serverError(); + defer parsed.deinit(); + const elapsed_ms = elapsedSince(mission_runtime.started_at_ms, now_ms); + const recovery_elapsed_ms = elapsedSince(mission_runtime.recovery_started_at_ms, now_ms); + const phase = currentPhase(parsed.value, mission_runtime, elapsed_ms, recovery_elapsed_ms); + + if (is_launch) { + if (!canLaunch(mission_runtime)) { + return missionAlreadyStarted(); + } + mission_runtime = .{ + .launched = true, + .started_at_ms = now_ms, + }; + const body = buildStateJson(allocator, mission_runtime, now_ms) catch return helpers.serverError(); + return helpers.jsonOk(body); + } + + if (is_recover) { + if (!canRecover(parsed.value, mission_runtime, phase)) { + return missionNotRecoverable(); + } + mission_runtime.recovered = true; + mission_runtime.recovery_started_at_ms = now_ms; + const body = buildStateJson(allocator, mission_runtime, now_ms) catch return helpers.serverError(); + return helpers.jsonOk(body); + } + + return helpers.notFound(); +} + +fn missionAlreadyStarted() ApiResponse { + return .{ + .status = "409 Conflict", + .content_type = "application/json", + .body = "{\"error\":{\"code\":\"mission_already_started\",\"message\":\"Mission is already started. Reset before launching again.\"}}", + }; +} + +fn missionNotRecoverable() ApiResponse { + return .{ + .status = "409 Conflict", + .content_type = "application/json", + .body = "{\"error\":{\"code\":\"mission_not_recoverable\",\"message\":\"Mission can only be recovered after the validation failure phase.\"}}", + }; +} + +fn stripQuery(target: []const u8) []const u8 { + if (std.mem.indexOfScalar(u8, target, '?')) |idx| return target[0..idx]; + return target; +} + +fn buildStateJson(allocator: std.mem.Allocator, runtime: RuntimeState, now_ms: i64) ![]u8 { + var parsed = try replay.parseValidated(allocator); + defer parsed.deinit(); + const fixture = parsed.value; + + const elapsed_ms = elapsedSince(runtime.started_at_ms, now_ms); + const recovery_elapsed_ms = elapsedSince(runtime.recovery_started_at_ms, now_ms); + const phase = currentPhase(fixture, runtime, elapsed_ms, recovery_elapsed_ms); + const agents = try buildAgents(allocator, fixture, phase); + defer allocator.free(agents); + const nodes = try buildNodes(allocator, fixture, phase); + defer allocator.free(nodes); + const edges = try buildEdges(allocator, fixture, phase); + defer allocator.free(edges); + const events = try buildEvents(allocator, fixture, phase); + defer allocator.free(events); + const snapshot = buildSnapshot( + fixture, + runtime, + now_ms, + elapsed_ms, + phase, + agents, + nodes, + edges, + events, + ); + return std.json.Stringify.valueAlloc(allocator, snapshot, .{ .whitespace = .indent_2 }); +} + +fn buildReplayArtifactJson(allocator: std.mem.Allocator, runtime: RuntimeState, now_ms: i64) ![]u8 { + var parsed = try replay.parseValidated(allocator); + defer parsed.deinit(); + const fixture = parsed.value; + + const elapsed_ms = elapsedSince(runtime.started_at_ms, now_ms); + const recovery_elapsed_ms = elapsedSince(runtime.recovery_started_at_ms, now_ms); + const phase = currentPhase(fixture, runtime, elapsed_ms, recovery_elapsed_ms); + const agents = try buildAgents(allocator, fixture, phase); + defer allocator.free(agents); + const nodes = try buildNodes(allocator, fixture, phase); + defer allocator.free(nodes); + const edges = try buildEdges(allocator, fixture, phase); + defer allocator.free(edges); + const events = try buildEvents(allocator, fixture, phase); + defer allocator.free(events); + const snapshot = buildSnapshot( + fixture, + runtime, + now_ms, + elapsed_ms, + phase, + agents, + nodes, + edges, + events, + ); + const artifact = ReplayArtifact{ + .artifact_schema_version = 1, + .artifact_kind = "nullhub.mission_control.replay", + .generated_at_ms = now_ms, + .replay_fixture_path = "src/api/mission_control/code_red.v1.json", + .scenario_id = fixture.scenario_id, + .scenario_version = fixture.scenario_version, + .mode = fixture.mode, + .snapshot = snapshot, + .replay_fixture = fixture, + .ecosystem_mapping = replayArtifactMapping(fixture), + }; + return std.json.Stringify.valueAlloc(allocator, artifact, .{ .whitespace = .indent_2 }); +} + +fn replayArtifactMapping(fixture: replay.ReplayFixture) ReplayArtifactMapping { + return .{ + .nulltickets = .{ + .component = "nulltickets", + .role = "Tracker-style task source and terminal workflow status.", + .evidence = &.{ "events[source=nulltickets]", "graph.nodes[kind=tracker]" }, + }, + .nullboiler = .{ + .component = "nullboiler", + .role = "Workflow orchestration, checkpointing, dispatch, and fork recovery.", + .checkpoint_id = fixture.checkpoint_id, + .failed_run_id = fixture.run_ids.failed, + .recovered_run_id = fixture.run_ids.recovered, + .human_instruction = fixture.human_instruction, + .evidence = &.{ "phases", "graph.edges", "events[source=nullboiler]", "failure.checkpoint_id", "recovery.forked_from" }, + }, + .nullclaw = .{ + .component = "nullclaw", + .role = "Lightweight role agents that perform research, coding, testing, and review steps.", + .evidence = &.{ "agents", "events[source=nullclaw]", "graph.nodes[kind=agent]" }, + }, + .nullwatch = .{ + .component = "nullwatch", + .role = "Run, span, eval, token, cost, and failure telemetry references.", + .failed_run_id = fixture.run_ids.failed, + .recovered_run_id = fixture.run_ids.recovered, + .trace_ref_source = "events[].trace", + .evidence = &.{ "events[].trace", "telemetry", "failure.run_id", "recovery.run_id" }, + }, + }; +} + +fn buildSnapshot( + fixture: replay.ReplayFixture, + runtime: RuntimeState, + now_ms: i64, + elapsed_ms: i64, + phase: []const u8, + agents: []const Agent, + nodes: []const GraphNode, + edges: []const GraphEdge, + events: []const MissionEvent, +) MissionSnapshot { + const failed_visible = isAtOrAfter(fixture, phase, fixture.failure.visible_from_phase); + const recovered_visible = runtime.recovered; + const phase_def = replay.phaseById(fixture, phase).?; + + return .{ + .schema_version = fixture.schema_version, + .mode = fixture.mode, + .scenario_id = fixture.scenario_id, + .scenario_version = fixture.scenario_version, + .generated_at_ms = now_ms, + .mission_id = fixture.scenario_id, + .title = fixture.title, + .status = phase_def.status, + .phase = phase, + .headline = phase_def.headline, + .elapsed_ms = if (runtime.launched) elapsed_ms else 0, + .progress = phase_def.progress, + .active_run_id = activeRunId(fixture, phase), + .failed_run_id = if (failed_visible) fixture.failure.run_id else null, + .recovered_run_id = if (recovered_visible) fixture.recovery.run_id else null, + .controls = .{ + .can_launch = canLaunch(runtime), + .can_recover = canRecover(fixture, runtime, phase), + .can_reset = true, + }, + .agents = agents, + .graph = .{ + .nodes = nodes, + .edges = edges, + }, + .events = events, + .telemetry = telemetryForPhase(fixture, phase), + .failure = if (failed_visible) FailurePanel{ + .run_id = fixture.failure.run_id, + .checkpoint_id = fixture.failure.checkpoint_id, + .failed_step = fixture.failure.failed_step, + .error_message = fixture.failure.error_message, + .suggested_intervention = fixture.failure.suggested_intervention, + } else null, + .recovery = if (recovered_visible) RecoveryPanel{ + .run_id = fixture.recovery.run_id, + .forked_from = fixture.recovery.forked_from, + .human_instruction = fixture.recovery.human_instruction, + .status = if (std.mem.eql(u8, phase, "completed")) "passed" else "replaying", + } else null, + }; +} + +fn elapsedSince(start_ms: i64, now_ms: i64) i64 { + if (start_ms <= 0 or now_ms <= start_ms) return 0; + return now_ms - start_ms; +} + +fn currentPhase(fixture: replay.ReplayFixture, runtime: RuntimeState, elapsed_ms: i64, recovery_elapsed_ms: i64) []const u8 { + if (!runtime.launched) return "idle"; + return phaseForTrack(fixture, if (runtime.recovered) "recovery" else "primary", if (runtime.recovered) recovery_elapsed_ms else elapsed_ms); +} + +fn phaseForTrack(fixture: replay.ReplayFixture, track: []const u8, elapsed_ms: i64) []const u8 { + var selected: ?replay.PhaseDef = null; + for (fixture.phases) |phase| { + if (!std.mem.eql(u8, phase.track, track)) continue; + if (phase.starts_at_ms > elapsed_ms) continue; + if (selected == null or phase.starts_at_ms >= selected.?.starts_at_ms) { + selected = phase; + } + } + return if (selected) |phase| phase.id else "idle"; +} + +fn canLaunch(runtime: RuntimeState) bool { + return !runtime.launched; +} + +fn canRecover(fixture: replay.ReplayFixture, runtime: RuntimeState, phase: []const u8) bool { + return runtime.launched and !runtime.recovered and std.mem.eql(u8, phase, fixture.failure.visible_from_phase); +} + +fn activeRunId(fixture: replay.ReplayFixture, phase: []const u8) ?[]const u8 { + if (std.mem.eql(u8, phase, "idle")) return null; + if (isAtOrAfter(fixture, phase, fixture.recovery.visible_from_phase)) return fixture.recovery.run_id; + return fixture.failure.run_id; +} + +fn statusAfter(fixture: replay.ReplayFixture, phase: []const u8, own_phase: []const u8) []const u8 { + const current_rank = phaseRank(fixture, phase); + const own_rank = phaseRank(fixture, own_phase); + if (current_rank > own_rank) return "done"; + if (current_rank == own_rank) return "active"; + return "pending"; +} + +fn buildAgents(allocator: std.mem.Allocator, fixture: replay.ReplayFixture, phase: []const u8) ![]Agent { + const agents = try allocator.alloc(Agent, fixture.agents.len); + for (fixture.agents, 0..) |agent, index| { + agents[index] = .{ + .id = agent.id, + .role = agent.role, + .status = agentStatus(fixture, agent, phase), + .current_step = agentStep(agent, phase), + }; + } + return agents; +} + +fn agentStatus(fixture: replay.ReplayFixture, agent: replay.AgentDef, phase: []const u8) []const u8 { + for (agent.active_phases) |active_phase| { + if (std.mem.eql(u8, phase, active_phase)) return "active"; + } + if (agent.failed_phase) |failed_phase| { + if (std.mem.eql(u8, phase, failed_phase)) return "failed"; + } + if (agent.blocked_phase) |blocked_phase| { + if (std.mem.eql(u8, phase, blocked_phase)) return "blocked"; + } + if (phaseRank(fixture, phase) > phaseRank(fixture, agent.done_after_phase)) return "done"; + return "standby"; +} + +fn agentStep(agent: replay.AgentDef, phase: []const u8) []const u8 { + for (agent.steps) |step| { + if (std.mem.eql(u8, step.phase, phase)) return step.step; + } + return "waiting"; +} + +fn buildNodes(allocator: std.mem.Allocator, fixture: replay.ReplayFixture, phase: []const u8) ![]GraphNode { + const nodes = try allocator.alloc(GraphNode, fixture.graph.nodes.len); + for (fixture.graph.nodes, 0..) |node, index| { + nodes[index] = .{ + .id = node.id, + .label = node.label, + .kind = node.kind, + .status = nodeStatus(fixture, node, phase), + }; + } + return nodes; +} + +fn nodeStatus(fixture: replay.ReplayFixture, node: replay.GraphNodeDef, phase: []const u8) []const u8 { + if (node.error_phase) |error_phase| { + if (std.mem.eql(u8, phase, error_phase)) return "error"; + } + return statusAfter(fixture, phase, node.phase); +} + +fn buildEdges(allocator: std.mem.Allocator, fixture: replay.ReplayFixture, phase: []const u8) ![]GraphEdge { + const edges = try allocator.alloc(GraphEdge, fixture.graph.edges.len); + for (fixture.graph.edges, 0..) |edge, index| { + edges[index] = .{ + .from = edge.from, + .to = edge.to, + .status = edgeStatus(fixture, edge, phase), + }; + } + return edges; +} + +fn edgeStatus(fixture: replay.ReplayFixture, edge: replay.GraphEdgeDef, phase: []const u8) []const u8 { + if (edge.error_phase) |error_phase| { + if (std.mem.eql(u8, phase, error_phase)) return "error"; + } + return statusAfter(fixture, phase, edge.phase); +} + +fn buildEvents(allocator: std.mem.Allocator, fixture: replay.ReplayFixture, phase: []const u8) ![]MissionEvent { + const events = try allocator.alloc(MissionEvent, fixture.events.len); + for (fixture.events, 0..) |event, index| { + events[index] = .{ + .at_ms = event.at_ms, + .source = event.source, + .level = event.level, + .title = event.title, + .detail = event.detail, + .status = statusAfter(fixture, phase, event.phase), + .trace = event.trace, + }; + } + return events; +} + +fn telemetryForPhase(fixture: replay.ReplayFixture, phase: []const u8) MissionTelemetry { + const current_rank = phaseRank(fixture, phase); + var selected = fixture.telemetry[0]; + var selected_rank = phaseRank(fixture, selected.phase); + for (fixture.telemetry) |entry| { + const entry_rank = phaseRank(fixture, entry.phase); + if (entry_rank <= current_rank and entry_rank >= selected_rank) { + selected = entry; + selected_rank = entry_rank; + } + } + return .{ + .runs = selected.runs, + .spans = selected.spans, + .evals = selected.evals, + .errors = selected.errors, + .total_tokens = selected.total_tokens, + .total_cost_usd = selected.total_cost_usd, + .verdict = selected.verdict, + }; +} + +fn phaseRank(fixture: replay.ReplayFixture, phase: []const u8) u8 { + return replay.phaseRank(fixture, phase) orelse 0; +} + +fn isAtOrAfter(fixture: replay.ReplayFixture, phase: []const u8, threshold: []const u8) bool { + return phaseRank(fixture, phase) >= phaseRank(fixture, threshold); +} + +test "isPath matches mission-control namespace" { + try std.testing.expect(isPath("/api/mission-control/state")); + try std.testing.expect(isPath("/api/mission-control/replay")); + try std.testing.expect(isPath("/api/mission-control/reset")); + try std.testing.expect(isPath("/api/mission-control/state?poll=1")); + try std.testing.expect(!isPath("/api/observability/v1/runs")); +} + +test "buildStateJson returns idle mission before launch" { + const json = try buildStateJson(std.testing.allocator, .{}, 1_000); + defer std.testing.allocator.free(json); + + try std.testing.expect(std.mem.indexOf(u8, json, "\"schema_version\": 1") != null); + try std.testing.expect(std.mem.indexOf(u8, json, "\"mode\": \"deterministic_local_replay\"") != null); + try std.testing.expect(std.mem.indexOf(u8, json, "\"scenario_id\": \"mission-code-red\"") != null); + try std.testing.expect(std.mem.indexOf(u8, json, "\"status\": \"idle\"") != null); + try std.testing.expect(std.mem.indexOf(u8, json, "\"phase\": \"idle\"") != null); + try std.testing.expect(std.mem.indexOf(u8, json, "\"can_launch\": true") != null); +} + +test "buildStateJson exposes failed mission and recover control" { + const json = try buildStateJson(std.testing.allocator, .{ + .launched = true, + .started_at_ms = 1_000, + }, 11_000); + defer std.testing.allocator.free(json); + + try std.testing.expect(std.mem.indexOf(u8, json, "\"status\": \"intervention_required\"") != null); + try std.testing.expect(std.mem.indexOf(u8, json, "\"phase\": \"failed\"") != null); + try std.testing.expect(std.mem.indexOf(u8, json, "\"can_recover\": true") != null); + try std.testing.expect(std.mem.indexOf(u8, json, "\"trace_id\": \"trace-demo-code-red-primary\"") != null); + try std.testing.expect(std.mem.indexOf(u8, json, "\"eval_key\": \"tool_success\"") != null); + try std.testing.expect(std.mem.indexOf(u8, json, "zig build test exited with status 1") != null); +} + +test "buildStateJson exposes recovered completed mission" { + const json = try buildStateJson(std.testing.allocator, .{ + .launched = true, + .started_at_ms = 1_000, + .recovered = true, + .recovery_started_at_ms = 11_000, + }, 19_000); + defer std.testing.allocator.free(json); + + try std.testing.expect(std.mem.indexOf(u8, json, "\"status\": \"completed\"") != null); + try std.testing.expect(std.mem.indexOf(u8, json, "\"phase\": \"completed\"") != null); + try std.testing.expect(std.mem.indexOf(u8, json, "\"recovered_run_id\": \"run-demo-recovered-fork\"") != null); + try std.testing.expect(std.mem.indexOf(u8, json, "\"verdict\": \"pass\"") != null); +} + +test "buildReplayArtifactJson exports fixture snapshot and ecosystem mapping" { + const json = try buildReplayArtifactJson(std.testing.allocator, .{ + .launched = true, + .started_at_ms = 1_000, + .recovered = true, + .recovery_started_at_ms = 11_000, + }, 19_000); + defer std.testing.allocator.free(json); + + try std.testing.expect(std.mem.indexOf(u8, json, "\"artifact_kind\": \"nullhub.mission_control.replay\"") != null); + try std.testing.expect(std.mem.indexOf(u8, json, "\"replay_fixture_path\": \"src/api/mission_control/code_red.v1.json\"") != null); + try std.testing.expect(std.mem.indexOf(u8, json, "\"snapshot\"") != null); + try std.testing.expect(std.mem.indexOf(u8, json, "\"replay_fixture\"") != null); + try std.testing.expect(std.mem.indexOf(u8, json, "\"ecosystem_mapping\"") != null); + try std.testing.expect(std.mem.indexOf(u8, json, "\"nullwatch\"") != null); + try std.testing.expect(std.mem.indexOf(u8, json, "\"trace_ref_source\": \"events[].trace\"") != null); +} + +test "handle supports reset launch and recovery after failure" { + const reset = handle(std.testing.allocator, "POST", "/api/mission-control/reset"); + defer std.testing.allocator.free(reset.body); + try std.testing.expectEqualStrings("200 OK", reset.status); + + const launched = handle(std.testing.allocator, "POST", "/api/mission-control/launch"); + defer std.testing.allocator.free(launched.body); + try std.testing.expectEqualStrings("200 OK", launched.status); + try std.testing.expect(std.mem.indexOf(u8, launched.body, "\"status\": \"running\"") != null); + + mission_mutex.lock(); + mission_runtime = .{ + .launched = true, + .started_at_ms = std_compat.time.milliTimestamp() - 10_000, + }; + mission_mutex.unlock(); + + const recovered = handle(std.testing.allocator, "POST", "/api/mission-control/recover"); + defer std.testing.allocator.free(recovered.body); + try std.testing.expectEqualStrings("200 OK", recovered.status); + try std.testing.expect(std.mem.indexOf(u8, recovered.body, "\"recovered_run_id\": \"run-demo-recovered-fork\"") != null); +} + +test "handle rejects invalid mission transitions" { + const reset = handle(std.testing.allocator, "POST", "/api/mission-control/reset"); + defer std.testing.allocator.free(reset.body); + try std.testing.expectEqualStrings("200 OK", reset.status); + + const early_recover = handle(std.testing.allocator, "POST", "/api/mission-control/recover"); + try std.testing.expectEqualStrings("409 Conflict", early_recover.status); + try std.testing.expect(std.mem.indexOf(u8, early_recover.body, "mission_not_recoverable") != null); + + const launched = handle(std.testing.allocator, "POST", "/api/mission-control/launch"); + defer std.testing.allocator.free(launched.body); + try std.testing.expectEqualStrings("200 OK", launched.status); + + const duplicate_launch = handle(std.testing.allocator, "POST", "/api/mission-control/launch"); + try std.testing.expectEqualStrings("409 Conflict", duplicate_launch.status); + try std.testing.expect(std.mem.indexOf(u8, duplicate_launch.body, "mission_already_started") != null); +} + +test "handle returns clear status codes for unknown paths and methods" { + const unknown_get = handle(std.testing.allocator, "GET", "/api/mission-control/nope"); + try std.testing.expectEqualStrings("404 Not Found", unknown_get.status); + + const wrong_method = handle(std.testing.allocator, "GET", "/api/mission-control/launch"); + try std.testing.expectEqualStrings("405 Method Not Allowed", wrong_method.status); + + const wrong_replay_method = handle(std.testing.allocator, "POST", "/api/mission-control/replay"); + try std.testing.expectEqualStrings("405 Method Not Allowed", wrong_replay_method.status); +} + +test "handle returns replay artifact" { + const replay_resp = handle(std.testing.allocator, "GET", "/api/mission-control/replay"); + defer std.testing.allocator.free(replay_resp.body); + + try std.testing.expectEqualStrings("200 OK", replay_resp.status); + try std.testing.expect(std.mem.indexOf(u8, replay_resp.body, "\"artifact_schema_version\": 1") != null); + try std.testing.expect(std.mem.indexOf(u8, replay_resp.body, "\"artifact_kind\": \"nullhub.mission_control.replay\"") != null); +} diff --git a/src/api/mission_control/code_red.v1.json b/src/api/mission_control/code_red.v1.json new file mode 100644 index 0000000..d32e510 --- /dev/null +++ b/src/api/mission_control/code_red.v1.json @@ -0,0 +1,505 @@ +{ + "schema_version": 1, + "mode": "deterministic_local_replay", + "scenario_id": "mission-code-red", + "scenario_version": "2026-05-06", + "title": "Ship a bug fix through autonomous agents", + "run_ids": { + "failed": "run-demo-failed-test", + "recovered": "run-demo-recovered-fork" + }, + "checkpoint_id": "cp-demo-before-test", + "human_instruction": "apply missing validation guard", + "phases": [ + { + "id": "idle", + "track": "idle", + "rank": 0, + "starts_at_ms": 0, + "status": "idle", + "progress": 0, + "headline": "Mission ready. Launch the local agent stack." + }, + { + "id": "launching", + "track": "primary", + "rank": 1, + "starts_at_ms": 0, + "status": "running", + "progress": 8, + "headline": "Backlog item claimed and orchestration run started." + }, + { + "id": "research", + "track": "primary", + "rank": 2, + "starts_at_ms": 1500, + "status": "running", + "progress": 20, + "headline": "Research agent is isolating the defect." + }, + { + "id": "coding", + "track": "primary", + "rank": 3, + "starts_at_ms": 3500, + "status": "running", + "progress": 36, + "headline": "Coder agent is applying a targeted patch." + }, + { + "id": "checkpoint", + "track": "primary", + "rank": 4, + "starts_at_ms": 5500, + "status": "running", + "progress": 48, + "headline": "Checkpoint captured before validation." + }, + { + "id": "testing", + "track": "primary", + "rank": 5, + "starts_at_ms": 7000, + "status": "running", + "progress": 60, + "headline": "Test runner is validating the patch." + }, + { + "id": "failed", + "track": "primary", + "rank": 6, + "starts_at_ms": 9000, + "status": "intervention_required", + "progress": 64, + "headline": "Validation failed. Human intervention is required." + }, + { + "id": "forking", + "track": "recovery", + "rank": 7, + "starts_at_ms": 0, + "status": "running", + "progress": 70, + "headline": "Forking from checkpoint with human guidance." + }, + { + "id": "patching", + "track": "recovery", + "rank": 8, + "starts_at_ms": 1500, + "status": "running", + "progress": 80, + "headline": "Recovered run is applying the injected fix." + }, + { + "id": "retesting", + "track": "recovery", + "rank": 9, + "starts_at_ms": 3500, + "status": "running", + "progress": 88, + "headline": "Recovered run is replaying tests." + }, + { + "id": "review", + "track": "recovery", + "rank": 10, + "starts_at_ms": 5500, + "status": "running", + "progress": 94, + "headline": "Reviewer agent is approving the recovered fix." + }, + { + "id": "completed", + "track": "recovery", + "rank": 11, + "starts_at_ms": 7000, + "status": "completed", + "progress": 100, + "headline": "Mission recovered and completed." + } + ], + "agents": [ + { + "id": "agent-researcher", + "role": "researcher", + "active_phases": ["research"], + "done_after_phase": "research", + "blocked_phase": null, + "failed_phase": null, + "steps": [ + { + "phase": "research", + "step": "reading ticket and isolating scope" + } + ] + }, + { + "id": "agent-coder", + "role": "coder", + "active_phases": ["coding", "patching"], + "done_after_phase": "coding", + "blocked_phase": "failed", + "failed_phase": null, + "steps": [ + { + "phase": "coding", + "step": "editing implementation" + }, + { + "phase": "patching", + "step": "applying injected guard" + } + ] + }, + { + "id": "agent-test-runner", + "role": "test", + "active_phases": ["testing", "retesting"], + "done_after_phase": "testing", + "blocked_phase": null, + "failed_phase": "failed", + "steps": [ + { + "phase": "testing", + "step": "running failing validation" + }, + { + "phase": "retesting", + "step": "replaying validation" + } + ] + }, + { + "id": "agent-reviewer", + "role": "reviewer", + "active_phases": ["review"], + "done_after_phase": "review", + "blocked_phase": null, + "failed_phase": null, + "steps": [ + { + "phase": "review", + "step": "checking recovered patch" + } + ] + } + ], + "graph": { + "nodes": [ + { + "id": "ticket", + "label": "Ticket", + "kind": "tracker", + "phase": "launching", + "error_phase": null + }, + { + "id": "research", + "label": "Research", + "kind": "agent", + "phase": "research", + "error_phase": null + }, + { + "id": "code", + "label": "Patch", + "kind": "agent", + "phase": "coding", + "error_phase": null + }, + { + "id": "checkpoint", + "label": "Checkpoint", + "kind": "checkpoint", + "phase": "checkpoint", + "error_phase": null + }, + { + "id": "test", + "label": "Test", + "kind": "tool", + "phase": "testing", + "error_phase": "failed" + }, + { + "id": "recover", + "label": "Fork", + "kind": "human", + "phase": "forking", + "error_phase": null + }, + { + "id": "review", + "label": "Review", + "kind": "agent", + "phase": "review", + "error_phase": null + } + ], + "edges": [ + { + "from": "ticket", + "to": "research", + "phase": "research", + "error_phase": null + }, + { + "from": "research", + "to": "code", + "phase": "coding", + "error_phase": null + }, + { + "from": "code", + "to": "checkpoint", + "phase": "checkpoint", + "error_phase": null + }, + { + "from": "checkpoint", + "to": "test", + "phase": "testing", + "error_phase": "failed" + }, + { + "from": "checkpoint", + "to": "recover", + "phase": "forking", + "error_phase": null + }, + { + "from": "recover", + "to": "review", + "phase": "review", + "error_phase": null + } + ] + }, + "events": [ + { + "at_ms": 0, + "phase": "launching", + "source": "nulltickets", + "level": "info", + "title": "Task queued", + "detail": "Mission task entered the durable backlog.", + "trace": { + "kind": "span", + "run_id": "run-demo-failed-test", + "trace_id": "trace-demo-code-red-primary", + "span_id": "span-ticket-queued", + "operation": "ticket.claim" + } + }, + { + "at_ms": 1500, + "phase": "research", + "source": "nullboiler", + "level": "info", + "title": "Workflow run started", + "detail": "Graph runtime selected researcher -> coder -> test -> review.", + "trace": { + "kind": "span", + "run_id": "run-demo-failed-test", + "trace_id": "trace-demo-code-red-primary", + "span_id": "span-workflow-started", + "operation": "workflow.dispatch" + } + }, + { + "at_ms": 3500, + "phase": "coding", + "source": "nullclaw", + "level": "info", + "title": "Patch drafted", + "detail": "Coder agent produced a minimal implementation patch.", + "trace": { + "kind": "span", + "run_id": "run-demo-failed-test", + "trace_id": "trace-demo-code-red-primary", + "span_id": "span-agent-coder", + "operation": "agent.coder.patch" + } + }, + { + "at_ms": 5500, + "phase": "checkpoint", + "source": "nullboiler", + "level": "info", + "title": "Checkpoint created", + "detail": "State saved as cp-demo-before-test.", + "trace": { + "kind": "span", + "run_id": "run-demo-failed-test", + "trace_id": "trace-demo-code-red-primary", + "span_id": "span-checkpoint-created", + "operation": "workflow.checkpoint" + } + }, + { + "at_ms": 7000, + "phase": "failed", + "source": "nullwatch", + "level": "error", + "title": "Validation failed", + "detail": "tool.call shell returned exit code 1.", + "trace": { + "kind": "eval", + "run_id": "run-demo-failed-test", + "trace_id": "trace-demo-code-red-primary", + "span_id": "span-test-failure", + "eval_key": "tool_success", + "operation": "tool.shell.zig_build_test" + } + }, + { + "at_ms": 9000, + "phase": "failed", + "source": "human", + "level": "warning", + "title": "Intervention requested", + "detail": "Fork from checkpoint and inject validation guard.", + "trace": { + "kind": "span", + "run_id": "run-demo-failed-test", + "trace_id": "trace-demo-code-red-primary", + "span_id": "span-human-intervention", + "operation": "human.checkpoint_fork" + } + }, + { + "at_ms": 10500, + "phase": "forking", + "source": "nullboiler", + "level": "info", + "title": "Recovered fork started", + "detail": "run-demo-recovered-fork replayed from cp-demo-before-test.", + "trace": { + "kind": "span", + "run_id": "run-demo-recovered-fork", + "trace_id": "trace-demo-code-red-recovered", + "span_id": "span-recovered-fork", + "operation": "workflow.fork" + } + }, + { + "at_ms": 13000, + "phase": "retesting", + "source": "nullwatch", + "level": "info", + "title": "Recovered tests passed", + "detail": "tool_success eval changed from fail to pass.", + "trace": { + "kind": "eval", + "run_id": "run-demo-recovered-fork", + "trace_id": "trace-demo-code-red-recovered", + "span_id": "span-recovered-tests", + "eval_key": "tool_success", + "operation": "tool.shell.zig_build_test" + } + }, + { + "at_ms": 15000, + "phase": "completed", + "source": "nulltickets", + "level": "success", + "title": "Mission done", + "detail": "Ticket moved to review-approved terminal stage.", + "trace": { + "kind": "span", + "run_id": "run-demo-recovered-fork", + "trace_id": "trace-demo-code-red-recovered", + "span_id": "span-ticket-completed", + "operation": "ticket.complete" + } + } + ], + "telemetry": [ + { + "phase": "idle", + "runs": 0, + "spans": 0, + "evals": 0, + "errors": 0, + "total_tokens": 0, + "total_cost_usd": 0.0, + "verdict": "not_started" + }, + { + "phase": "research", + "runs": 1, + "spans": 2, + "evals": 0, + "errors": 0, + "total_tokens": 740, + "total_cost_usd": 0.006, + "verdict": "running" + }, + { + "phase": "checkpoint", + "runs": 1, + "spans": 5, + "evals": 1, + "errors": 0, + "total_tokens": 1810, + "total_cost_usd": 0.017, + "verdict": "running" + }, + { + "phase": "testing", + "runs": 1, + "spans": 7, + "evals": 1, + "errors": 0, + "total_tokens": 2320, + "total_cost_usd": 0.021, + "verdict": "running" + }, + { + "phase": "failed", + "runs": 1, + "spans": 8, + "evals": 2, + "errors": 2, + "total_tokens": 2760, + "total_cost_usd": 0.022, + "verdict": "fail" + }, + { + "phase": "retesting", + "runs": 2, + "spans": 13, + "evals": 3, + "errors": 2, + "total_tokens": 3910, + "total_cost_usd": 0.033, + "verdict": "recovering" + }, + { + "phase": "completed", + "runs": 2, + "spans": 16, + "evals": 4, + "errors": 2, + "total_tokens": 4280, + "total_cost_usd": 0.036, + "verdict": "pass" + } + ], + "failure": { + "visible_from_phase": "failed", + "run_id": "run-demo-failed-test", + "checkpoint_id": "cp-demo-before-test", + "failed_step": "test", + "error_message": "zig build test exited with status 1", + "suggested_intervention": "Fork from checkpoint and inject the missing validation guard." + }, + "recovery": { + "visible_from_phase": "forking", + "run_id": "run-demo-recovered-fork", + "forked_from": "cp-demo-before-test", + "human_instruction": "apply missing validation guard" + } +} diff --git a/src/api/mission_control_replay.zig b/src/api/mission_control_replay.zig new file mode 100644 index 0000000..5f918f0 --- /dev/null +++ b/src/api/mission_control_replay.zig @@ -0,0 +1,501 @@ +const std = @import("std"); + +pub const expected_schema_version: u8 = 1; +pub const embedded_json = @embedFile("mission_control/code_red.v1.json"); + +pub const ReplayFixture = struct { + schema_version: u8, + mode: []const u8, + scenario_id: []const u8, + scenario_version: []const u8, + title: []const u8, + run_ids: RunIds, + checkpoint_id: []const u8, + human_instruction: []const u8, + phases: []const PhaseDef, + agents: []const AgentDef, + graph: GraphDef, + events: []const EventDef, + telemetry: []const TelemetryDef, + failure: FailureDef, + recovery: RecoveryDef, +}; + +pub const RunIds = struct { + failed: []const u8, + recovered: []const u8, +}; + +pub const PhaseDef = struct { + id: []const u8, + track: []const u8, + rank: u8, + starts_at_ms: i64, + status: []const u8, + progress: u8, + headline: []const u8, +}; + +pub const AgentDef = struct { + id: []const u8, + role: []const u8, + active_phases: []const []const u8, + done_after_phase: []const u8, + blocked_phase: ?[]const u8, + failed_phase: ?[]const u8, + steps: []const AgentStepDef, +}; + +pub const AgentStepDef = struct { + phase: []const u8, + step: []const u8, +}; + +pub const GraphDef = struct { + nodes: []const GraphNodeDef, + edges: []const GraphEdgeDef, +}; + +pub const GraphNodeDef = struct { + id: []const u8, + label: []const u8, + kind: []const u8, + phase: []const u8, + error_phase: ?[]const u8, +}; + +pub const GraphEdgeDef = struct { + from: []const u8, + to: []const u8, + phase: []const u8, + error_phase: ?[]const u8, +}; + +pub const EventDef = struct { + at_ms: i64, + phase: []const u8, + source: []const u8, + level: []const u8, + title: []const u8, + detail: []const u8, + trace: ?EventTraceDef = null, +}; + +pub const EventTraceDef = struct { + kind: []const u8, + run_id: ?[]const u8 = null, + trace_id: ?[]const u8 = null, + span_id: ?[]const u8 = null, + eval_key: ?[]const u8 = null, + operation: []const u8, +}; + +pub const TelemetryDef = struct { + phase: []const u8, + runs: usize, + spans: usize, + evals: usize, + errors: usize, + total_tokens: usize, + total_cost_usd: f64, + verdict: []const u8, +}; + +pub const FailureDef = struct { + visible_from_phase: []const u8, + run_id: []const u8, + checkpoint_id: []const u8, + failed_step: []const u8, + error_message: []const u8, + suggested_intervention: []const u8, +}; + +pub const RecoveryDef = struct { + visible_from_phase: []const u8, + run_id: []const u8, + forked_from: []const u8, + human_instruction: []const u8, +}; + +pub const ValidationError = error{ + UnsupportedReplaySchema, + InvalidReplayFixture, + DuplicateReplayId, + UnknownReplayReference, + UnsortedReplayFixture, +}; + +pub fn parse(allocator: std.mem.Allocator) !std.json.Parsed(ReplayFixture) { + return parseBytes(allocator, embedded_json); +} + +pub fn parseBytes(allocator: std.mem.Allocator, bytes: []const u8) !std.json.Parsed(ReplayFixture) { + return std.json.parseFromSlice(ReplayFixture, allocator, bytes, .{ + .allocate = .alloc_always, + .ignore_unknown_fields = false, + }); +} + +pub fn parseValidated(allocator: std.mem.Allocator) !std.json.Parsed(ReplayFixture) { + var parsed = try parse(allocator); + errdefer parsed.deinit(); + try validate(parsed.value); + return parsed; +} + +pub fn validate(fixture: ReplayFixture) ValidationError!void { + if (fixture.schema_version != expected_schema_version) return error.UnsupportedReplaySchema; + try requireNonEmpty(fixture.mode); + try requireNonEmpty(fixture.scenario_id); + try requireNonEmpty(fixture.scenario_version); + try requireNonEmpty(fixture.title); + try requireNonEmpty(fixture.run_ids.failed); + try requireNonEmpty(fixture.run_ids.recovered); + try requireNonEmpty(fixture.checkpoint_id); + try requireNonEmpty(fixture.human_instruction); + + try validatePhases(fixture); + try requirePhase(fixture, "idle"); + try requirePhase(fixture, "failed"); + try requirePhase(fixture, "completed"); + try validateAgents(fixture); + try validateGraph(fixture); + try validateEvents(fixture); + try validateTelemetry(fixture); + try validateFailure(fixture.failure, fixture); + try validateRecovery(fixture.recovery, fixture); +} + +pub fn phaseById(fixture: ReplayFixture, id: []const u8) ?PhaseDef { + for (fixture.phases) |phase| { + if (std.mem.eql(u8, phase.id, id)) return phase; + } + return null; +} + +pub fn phaseRank(fixture: ReplayFixture, id: []const u8) ?u8 { + return if (phaseById(fixture, id)) |phase| phase.rank else null; +} + +fn validatePhases(fixture: ReplayFixture) ValidationError!void { + if (fixture.phases.len == 0) return error.InvalidReplayFixture; + + for (fixture.phases, 0..) |phase, index| { + try requireNonEmpty(phase.id); + try requireNonEmpty(phase.track); + try requireNonEmpty(phase.status); + try requireNonEmpty(phase.headline); + if (!isKnownTrack(phase.track)) return error.InvalidReplayFixture; + if (phase.progress > 100) return error.InvalidReplayFixture; + if (phase.starts_at_ms < 0) return error.InvalidReplayFixture; + + for (fixture.phases[0..index]) |previous| { + if (std.mem.eql(u8, previous.id, phase.id)) return error.DuplicateReplayId; + if (previous.rank == phase.rank) return error.DuplicateReplayId; + } + } + + try validateTrackOrdering(fixture, "primary"); + try validateTrackOrdering(fixture, "recovery"); +} + +fn validateTrackOrdering(fixture: ReplayFixture, track: []const u8) ValidationError!void { + var seen = false; + var last_start: i64 = 0; + for (fixture.phases) |phase| { + if (!std.mem.eql(u8, phase.track, track)) continue; + if (seen and phase.starts_at_ms < last_start) return error.UnsortedReplayFixture; + seen = true; + last_start = phase.starts_at_ms; + } + if (!seen) return error.InvalidReplayFixture; +} + +fn validateAgents(fixture: ReplayFixture) ValidationError!void { + if (fixture.agents.len == 0) return error.InvalidReplayFixture; + + for (fixture.agents, 0..) |agent, index| { + try requireNonEmpty(agent.id); + try requireNonEmpty(agent.role); + try requirePhase(fixture, agent.done_after_phase); + if (agent.active_phases.len == 0) return error.InvalidReplayFixture; + for (agent.active_phases) |phase| try requirePhase(fixture, phase); + if (agent.blocked_phase) |phase| try requirePhase(fixture, phase); + if (agent.failed_phase) |phase| try requirePhase(fixture, phase); + for (agent.steps) |step| { + try requirePhase(fixture, step.phase); + try requireNonEmpty(step.step); + } + + for (fixture.agents[0..index]) |previous| { + if (std.mem.eql(u8, previous.id, agent.id)) return error.DuplicateReplayId; + } + } +} + +fn validateGraph(fixture: ReplayFixture) ValidationError!void { + if (fixture.graph.nodes.len == 0) return error.InvalidReplayFixture; + + for (fixture.graph.nodes, 0..) |node, index| { + try requireNonEmpty(node.id); + try requireNonEmpty(node.label); + try requireNonEmpty(node.kind); + try requirePhase(fixture, node.phase); + if (node.error_phase) |phase| try requirePhase(fixture, phase); + + for (fixture.graph.nodes[0..index]) |previous| { + if (std.mem.eql(u8, previous.id, node.id)) return error.DuplicateReplayId; + } + } + + for (fixture.graph.edges) |edge| { + try requireNode(fixture, edge.from); + try requireNode(fixture, edge.to); + try requirePhase(fixture, edge.phase); + if (edge.error_phase) |phase| try requirePhase(fixture, phase); + } +} + +fn validateEvents(fixture: ReplayFixture) ValidationError!void { + if (fixture.events.len == 0) return error.InvalidReplayFixture; + var last_at_ms: i64 = 0; + for (fixture.events, 0..) |event, index| { + if (event.at_ms < 0) return error.InvalidReplayFixture; + if (index > 0 and event.at_ms < last_at_ms) return error.UnsortedReplayFixture; + last_at_ms = event.at_ms; + try requirePhase(fixture, event.phase); + try requireNonEmpty(event.source); + try requireNonEmpty(event.level); + try requireNonEmpty(event.title); + try requireNonEmpty(event.detail); + if (event.trace) |trace| try validateEventTrace(trace, fixture); + } +} + +fn validateEventTrace(trace: EventTraceDef, fixture: ReplayFixture) ValidationError!void { + try requireNonEmpty(trace.kind); + try requireNonEmpty(trace.operation); + if (!std.mem.eql(u8, trace.kind, "span") and !std.mem.eql(u8, trace.kind, "eval")) { + return error.InvalidReplayFixture; + } + + if (trace.run_id) |run_id| { + try requireNonEmpty(run_id); + try requireRun(fixture, run_id); + } + if (trace.trace_id) |trace_id| try requireNonEmpty(trace_id); + if (trace.span_id) |span_id| try requireNonEmpty(span_id); + if (trace.eval_key) |eval_key| try requireNonEmpty(eval_key); + + if (std.mem.eql(u8, trace.kind, "span") and trace.span_id == null) return error.InvalidReplayFixture; + if (std.mem.eql(u8, trace.kind, "eval") and trace.eval_key == null) return error.InvalidReplayFixture; +} + +fn validateTelemetry(fixture: ReplayFixture) ValidationError!void { + if (fixture.telemetry.len == 0) return error.InvalidReplayFixture; + for (fixture.telemetry) |entry| { + try requirePhase(fixture, entry.phase); + try requireNonEmpty(entry.verdict); + if (entry.errors > entry.spans) return error.InvalidReplayFixture; + if (entry.evals > entry.spans) return error.InvalidReplayFixture; + if (entry.total_cost_usd < 0) return error.InvalidReplayFixture; + } +} + +fn validateFailure(failure: FailureDef, fixture: ReplayFixture) ValidationError!void { + try requirePhase(fixture, failure.visible_from_phase); + try requireNonEmpty(failure.run_id); + try requireNonEmpty(failure.checkpoint_id); + try requireNonEmpty(failure.failed_step); + try requireNode(fixture, failure.failed_step); + try requireNonEmpty(failure.error_message); + try requireNonEmpty(failure.suggested_intervention); +} + +fn validateRecovery(recovery: RecoveryDef, fixture: ReplayFixture) ValidationError!void { + try requirePhase(fixture, recovery.visible_from_phase); + try requireNonEmpty(recovery.run_id); + try requireNonEmpty(recovery.forked_from); + try requireNonEmpty(recovery.human_instruction); +} + +fn requirePhase(fixture: ReplayFixture, id: []const u8) ValidationError!void { + if (phaseById(fixture, id) == null) return error.UnknownReplayReference; +} + +fn requireNode(fixture: ReplayFixture, id: []const u8) ValidationError!void { + for (fixture.graph.nodes) |node| { + if (std.mem.eql(u8, node.id, id)) return; + } + return error.UnknownReplayReference; +} + +fn requireRun(fixture: ReplayFixture, id: []const u8) ValidationError!void { + if (std.mem.eql(u8, id, fixture.run_ids.failed)) return; + if (std.mem.eql(u8, id, fixture.run_ids.recovered)) return; + return error.UnknownReplayReference; +} + +fn requireNonEmpty(value: []const u8) ValidationError!void { + if (value.len == 0) return error.InvalidReplayFixture; +} + +fn isKnownTrack(track: []const u8) bool { + return std.mem.eql(u8, track, "idle") or + std.mem.eql(u8, track, "primary") or + std.mem.eql(u8, track, "recovery"); +} + +test "embedded mission replay fixture validates" { + var parsed = try parseValidated(std.testing.allocator); + defer parsed.deinit(); + + try std.testing.expectEqual(@as(u8, expected_schema_version), parsed.value.schema_version); + try std.testing.expectEqualStrings("mission-code-red", parsed.value.scenario_id); + try std.testing.expect(phaseById(parsed.value, "completed") != null); +} + +test "validate rejects duplicate phase ids" { + const phases = [_]PhaseDef{ + .{ .id = "idle", .track = "idle", .rank = 0, .starts_at_ms = 0, .status = "idle", .progress = 0, .headline = "idle" }, + .{ .id = "failed", .track = "primary", .rank = 1, .starts_at_ms = 0, .status = "intervention_required", .progress = 60, .headline = "failed" }, + .{ .id = "failed", .track = "recovery", .rank = 2, .starts_at_ms = 0, .status = "completed", .progress = 100, .headline = "done" }, + }; + const fixture = minimalFixture(phases[0..], test_nodes[0..], test_edges[0..], test_events[0..], test_telemetry[0..]); + try std.testing.expectError(error.DuplicateReplayId, validate(fixture)); +} + +test "validate rejects graph edges pointing at unknown nodes" { + const edges = [_]GraphEdgeDef{ + .{ .from = "ticket", .to = "missing", .phase = "failed", .error_phase = null }, + }; + const fixture = minimalFixture(test_phases[0..], test_nodes[0..], edges[0..], test_events[0..], test_telemetry[0..]); + try std.testing.expectError(error.UnknownReplayReference, validate(fixture)); +} + +test "validate rejects telemetry for unknown phases" { + const telemetry = [_]TelemetryDef{ + .{ .phase = "missing", .runs = 1, .spans = 1, .evals = 0, .errors = 0, .total_tokens = 1, .total_cost_usd = 0.001, .verdict = "running" }, + }; + const fixture = minimalFixture(test_phases[0..], test_nodes[0..], test_edges[0..], test_events[0..], telemetry[0..]); + try std.testing.expectError(error.UnknownReplayReference, validate(fixture)); +} + +test "validate rejects trace refs for unknown run ids" { + const events = [_]EventDef{ + .{ + .at_ms = 0, + .phase = "launching", + .source = "nullwatch", + .level = "info", + .title = "event", + .detail = "detail", + .trace = .{ + .kind = "span", + .run_id = "missing-run", + .trace_id = "trace", + .span_id = "span", + .operation = "agent.step", + }, + }, + }; + const fixture = minimalFixture(test_phases[0..], test_nodes[0..], test_edges[0..], events[0..], test_telemetry[0..]); + try std.testing.expectError(error.UnknownReplayReference, validate(fixture)); +} + +test "validate rejects eval trace refs without eval keys" { + const events = [_]EventDef{ + .{ + .at_ms = 0, + .phase = "launching", + .source = "nullwatch", + .level = "info", + .title = "event", + .detail = "detail", + .trace = .{ + .kind = "eval", + .run_id = "failed-run", + .trace_id = "trace", + .span_id = "span", + .operation = "eval.tool_success", + }, + }, + }; + const fixture = minimalFixture(test_phases[0..], test_nodes[0..], test_edges[0..], events[0..], test_telemetry[0..]); + try std.testing.expectError(error.InvalidReplayFixture, validate(fixture)); +} + +fn minimalFixture( + phases: []const PhaseDef, + nodes: []const GraphNodeDef, + edges: []const GraphEdgeDef, + events: []const EventDef, + telemetry: []const TelemetryDef, +) ReplayFixture { + return .{ + .schema_version = expected_schema_version, + .mode = "deterministic_local_replay", + .scenario_id = "test", + .scenario_version = "v1", + .title = "test", + .run_ids = .{ .failed = "failed-run", .recovered = "recovered-run" }, + .checkpoint_id = "checkpoint", + .human_instruction = "fix", + .phases = phases, + .agents = test_agents[0..], + .graph = .{ .nodes = nodes, .edges = edges }, + .events = events, + .telemetry = telemetry, + .failure = .{ + .visible_from_phase = "failed", + .run_id = "failed-run", + .checkpoint_id = "checkpoint", + .failed_step = "ticket", + .error_message = "failed", + .suggested_intervention = "recover", + }, + .recovery = .{ + .visible_from_phase = "completed", + .run_id = "recovered-run", + .forked_from = "checkpoint", + .human_instruction = "fix", + }, + }; +} + +const test_phases = [_]PhaseDef{ + .{ .id = "idle", .track = "idle", .rank = 0, .starts_at_ms = 0, .status = "idle", .progress = 0, .headline = "idle" }, + .{ .id = "launching", .track = "primary", .rank = 1, .starts_at_ms = 0, .status = "running", .progress = 10, .headline = "launch" }, + .{ .id = "failed", .track = "primary", .rank = 2, .starts_at_ms = 100, .status = "intervention_required", .progress = 60, .headline = "failed" }, + .{ .id = "completed", .track = "recovery", .rank = 3, .starts_at_ms = 0, .status = "completed", .progress = 100, .headline = "done" }, +}; + +const test_agent_steps = [_]AgentStepDef{ + .{ .phase = "failed", .step = "work" }, +}; + +const test_agents = [_]AgentDef{ + .{ + .id = "agent", + .role = "coder", + .active_phases = &.{"failed"}, + .done_after_phase = "failed", + .blocked_phase = null, + .failed_phase = null, + .steps = test_agent_steps[0..], + }, +}; + +const test_nodes = [_]GraphNodeDef{ + .{ .id = "ticket", .label = "Ticket", .kind = "tracker", .phase = "launching", .error_phase = null }, +}; + +const test_edges = [_]GraphEdgeDef{}; + +const test_events = [_]EventDef{ + .{ .at_ms = 0, .phase = "launching", .source = "test", .level = "info", .title = "event", .detail = "detail" }, +}; + +const test_telemetry = [_]TelemetryDef{ + .{ .phase = "idle", .runs = 0, .spans = 0, .evals = 0, .errors = 0, .total_tokens = 0, .total_cost_usd = 0, .verdict = "idle" }, +}; diff --git a/src/root.zig b/src/root.zig index aeb7966..33af021 100644 --- a/src/root.zig +++ b/src/root.zig @@ -17,6 +17,8 @@ pub const manager = @import("supervisor/manager.zig"); pub const managed_skills = @import("managed_skills.zig"); pub const meta_api = @import("api/meta.zig"); pub const mdns = @import("mdns.zig"); +pub const mission_control_api = @import("api/mission_control.zig"); +pub const mission_control_replay = @import("api/mission_control_replay.zig"); pub const observability_api = @import("api/observability.zig"); pub const orchestrator = @import("installer/orchestrator.zig"); pub const manifest = @import("core/manifest.zig"); @@ -65,6 +67,8 @@ test { _ = managed_skills; _ = meta_api; _ = mdns; + _ = mission_control_api; + _ = mission_control_replay; _ = observability_api; _ = orchestrator; _ = manifest; diff --git a/src/server.zig b/src/server.zig index 1d7bec5..d764196 100644 --- a/src/server.zig +++ b/src/server.zig @@ -27,6 +27,7 @@ const usage_api = @import("api/usage.zig"); const report_api = @import("api/report.zig"); const orchestration_api = @import("api/orchestration.zig"); const observability_api = @import("api/observability.zig"); +const mission_control_api = @import("api/mission_control.zig"); const launch_args_mod = @import("core/launch_args.zig"); const ui_modules = @import("installer/ui_modules.zig"); const orchestrator = @import("installer/orchestrator.zig"); @@ -790,10 +791,16 @@ pub const Server = struct { instances_api.isTicketsActionPath(target) or logs_api.isLogsPath(target) or orchestration_api.isProxyPath(target) or - observability_api.isProxyPath(target); + observability_api.isProxyPath(target) or + mission_control_api.isPath(target); } fn route(self: *Server, allocator: std.mem.Allocator, method: []const u8, target: []const u8, body: []const u8) Response { + if (mission_control_api.isPath(target)) { + const resp = mission_control_api.handle(allocator, method, target); + return .{ .status = resp.status, .content_type = resp.content_type, .body = resp.body }; + } + if (std.mem.eql(u8, method, "GET")) { if (std.mem.eql(u8, target, "/health")) { return .{ @@ -2191,6 +2198,7 @@ test "routeWithoutServerMutex keeps orchestration proxy requests off global lock try std.testing.expect(Server.routeWithoutServerMutex("/api/orchestration/runs")); try std.testing.expect(Server.routeWithoutServerMutex("/api/orchestration/store/search")); try std.testing.expect(Server.routeWithoutServerMutex("/api/observability/v1/runs")); + try std.testing.expect(Server.routeWithoutServerMutex("/api/mission-control/state")); try std.testing.expect(Server.routeWithoutServerMutex("/api/instances/nullclaw/demo/logs")); try std.testing.expect(Server.routeWithoutServerMutex("/api/instances/nulltickets/tracker-a/tickets")); try std.testing.expect(!Server.routeWithoutServerMutex("/api/components")); diff --git a/tests/test_mission_control_smoke.sh b/tests/test_mission_control_smoke.sh new file mode 100755 index 0000000..dac1475 --- /dev/null +++ b/tests/test_mission_control_smoke.sh @@ -0,0 +1,72 @@ +#!/usr/bin/env bash +set -euo pipefail + +BASE_URL="${NULLHUB_URL:-http://127.0.0.1:19802}" + +node - "$BASE_URL" <<'NODE' +const base = process.argv[2]; + +async function api(path, method = 'GET') { + const res = await fetch(base + path, { method }); + const text = await res.text(); + const body = text ? JSON.parse(text) : null; + return { status: res.status, body }; +} + +function assert(condition, message) { + if (!condition) throw new Error(message); +} + +function sleep(ms) { + return new Promise((resolve) => setTimeout(resolve, ms)); +} + +let response = await api('/api/mission-control/reset', 'POST'); +assert(response.status === 200, `reset returned ${response.status}`); +assert(response.body.schema_version === 1, 'missing schema_version'); +assert(response.body.mode === 'deterministic_local_replay', 'unexpected mission mode'); +assert(response.body.status === 'idle', `expected idle, got ${response.body.status}`); + +response = await api('/api/mission-control/recover', 'POST'); +assert(response.status === 409, `early recover returned ${response.status}`); +assert(response.body.error?.code === 'mission_not_recoverable', 'missing recover conflict code'); + +response = await api('/api/mission-control/launch', 'POST'); +assert(response.status === 200, `launch returned ${response.status}`); +assert(response.body.status === 'running', `expected running, got ${response.body.status}`); + +response = await api('/api/mission-control/launch', 'POST'); +assert(response.status === 409, `duplicate launch returned ${response.status}`); +assert(response.body.error?.code === 'mission_already_started', 'missing launch conflict code'); + +await sleep(10_500); +response = await api('/api/mission-control/state'); +assert(response.status === 200, `state returned ${response.status}`); +assert(response.body.status === 'intervention_required', `expected intervention_required, got ${response.body.status}`); +assert(response.body.controls.can_recover === true, 'expected recover control'); +const failedEvent = response.body.events.find((event) => event.title === 'Validation failed'); +assert(failedEvent?.trace?.run_id === 'run-demo-failed-test', 'missing failed run trace ref'); +assert(failedEvent?.trace?.eval_key === 'tool_success', 'missing failed eval trace ref'); + +response = await api('/api/mission-control/recover', 'POST'); +assert(response.status === 200, `recover returned ${response.status}`); +assert(response.body.recovered_run_id === 'run-demo-recovered-fork', 'missing recovered run id'); + +await sleep(12_000); +response = await api('/api/mission-control/state'); +assert(response.status === 200, `final state returned ${response.status}`); +assert(response.body.status === 'completed', `expected completed, got ${response.body.status}`); +assert(response.body.telemetry.verdict === 'pass', `expected pass verdict, got ${response.body.telemetry.verdict}`); +const recoveredEvent = response.body.events.find((event) => event.title === 'Recovered tests passed'); +assert(recoveredEvent?.trace?.run_id === 'run-demo-recovered-fork', 'missing recovered run trace ref'); +const finalState = response.body; + +response = await api('/api/mission-control/replay'); +assert(response.status === 200, `replay export returned ${response.status}`); +assert(response.body.artifact_kind === 'nullhub.mission_control.replay', 'unexpected replay artifact kind'); +assert(response.body.snapshot?.status === 'completed', 'replay export missing completed snapshot'); +assert(response.body.replay_fixture?.scenario_id === 'mission-code-red', 'replay export missing source fixture'); +assert(response.body.ecosystem_mapping?.nullwatch?.trace_ref_source === 'events[].trace', 'replay export missing nullwatch mapping'); + +console.log(`mission-control smoke ok: ${finalState.status}, ${finalState.telemetry.spans} spans, ${finalState.telemetry.evals} evals`); +NODE diff --git a/ui/src/lib/api/client.ts b/ui/src/lib/api/client.ts index 6ed8d3d..c3fe7fc 100644 --- a/ui/src/lib/api/client.ts +++ b/ui/src/lib/api/client.ts @@ -20,6 +20,141 @@ export type LogSource = 'instance' | 'nullhub'; export type ReportOption = { value: string; label: string }; export type ReportTypeOption = ReportOption & { labels: string[] }; export type ReportRepoOption = ReportOption & { repo: string }; +export type MissionControlStatus = 'idle' | 'running' | 'intervention_required' | 'completed'; +export type MissionControlPhase = + | 'idle' + | 'launching' + | 'research' + | 'coding' + | 'checkpoint' + | 'testing' + | 'failed' + | 'forking' + | 'patching' + | 'retesting' + | 'review' + | 'completed'; +export type MissionControlControls = { + can_launch: boolean; + can_recover: boolean; + can_reset: boolean; +}; +export type MissionControlAgent = { + id: string; + role: string; + status: string; + current_step: string; +}; +export type MissionControlGraphNode = { + id: string; + label: string; + kind: string; + status: string; +}; +export type MissionControlGraphEdge = { + from: string; + to: string; + status: string; +}; +export type MissionControlTraceRef = { + kind: 'span' | 'eval'; + run_id: string | null; + trace_id: string | null; + span_id: string | null; + eval_key: string | null; + operation: string; +}; +export type MissionControlEvent = { + at_ms: number; + source: string; + level: string; + title: string; + detail: string; + status: string; + trace: MissionControlTraceRef | null; +}; +export type MissionControlTelemetry = { + runs: number; + spans: number; + evals: number; + errors: number; + total_tokens: number; + total_cost_usd: number; + verdict: string; +}; +export type MissionControlFailure = { + run_id: string; + checkpoint_id: string; + failed_step: string; + error_message: string; + suggested_intervention: string; +}; +export type MissionControlRecovery = { + run_id: string; + forked_from: string; + human_instruction: string; + status: string; +}; +export type MissionControlState = { + schema_version: number; + mode: string; + scenario_id: string; + scenario_version: string; + generated_at_ms: number; + mission_id: string; + title: string; + status: MissionControlStatus; + phase: MissionControlPhase; + headline: string; + elapsed_ms: number; + progress: number; + active_run_id: string | null; + failed_run_id: string | null; + recovered_run_id: string | null; + controls: MissionControlControls; + agents: MissionControlAgent[]; + graph: { + nodes: MissionControlGraphNode[]; + edges: MissionControlGraphEdge[]; + }; + events: MissionControlEvent[]; + telemetry: MissionControlTelemetry; + failure: MissionControlFailure | null; + recovery: MissionControlRecovery | null; +}; +export type MissionControlComponentMapping = { + component: string; + role: string; + evidence: string[]; +}; +export type MissionControlWorkflowMapping = MissionControlComponentMapping & { + checkpoint_id: string; + failed_run_id: string; + recovered_run_id: string; + human_instruction: string; +}; +export type MissionControlObservabilityMapping = MissionControlComponentMapping & { + failed_run_id: string; + recovered_run_id: string; + trace_ref_source: string; +}; +export type MissionControlReplayArtifact = { + artifact_schema_version: number; + artifact_kind: string; + generated_at_ms: number; + replay_fixture_path: string; + scenario_id: string; + scenario_version: string; + mode: string; + snapshot: MissionControlState; + replay_fixture: unknown; + ecosystem_mapping: { + nulltickets: MissionControlComponentMapping; + nullboiler: MissionControlWorkflowMapping; + nullclaw: MissionControlComponentMapping; + nullwatch: MissionControlObservabilityMapping; + }; +}; type InstanceStartOptions = { launch_mode?: string; verbose?: boolean; @@ -238,6 +373,12 @@ export const api = { }), ), + getMissionControlState: () => request('/mission-control/state'), + getMissionControlReplay: () => request('/mission-control/replay'), + launchMissionControl: () => request('/mission-control/launch', { method: 'POST' }), + resetMissionControl: () => request('/mission-control/reset', { method: 'POST' }), + recoverMissionControl: () => request('/mission-control/recover', { method: 'POST' }), + applyUpdate: (c: string, n: string) => request(`/instances/${c}/${n}/update`, { method: 'POST' }), diff --git a/ui/src/lib/components/Sidebar.svelte b/ui/src/lib/components/Sidebar.svelte index 0bf74a9..88c3b55 100644 --- a/ui/src/lib/components/Sidebar.svelte +++ b/ui/src/lib/components/Sidebar.svelte @@ -125,6 +125,7 @@