fix(update-check): auto-install patch bumps without SPAWN_AUTO_UPDATE#3296
Conversation
louisgv
left a comment
There was a problem hiding this comment.
Security Review
Verdict: APPROVED
Commit: 36b888c
Summary
This PR changes the auto-update policy to auto-install patch bumps (e.g., 1.0.6 → 1.0.7) without requiring SPAWN_AUTO_UPDATE=1, while keeping minor/major bumps as opt-in only.
Security Analysis
No security issues found. The change:
- Respects semver boundaries — patch versions are defined as bug fixes only, no behavior changes or breaking changes per semver spec
- Provides explicit opt-out —
SPAWN_NO_AUTO_UPDATE=1suppresses patch auto-install for users who need fully pinned CLI versions (CI environments) - No new injection vectors — uses existing
performAutoUpdate()which safely callsexecFileSyncwith array args (not shell interpolation) - Comprehensive test coverage — 5 new tests verify the policy matrix (patch auto-install, minor/major opt-in, opt-out)
Tests
- bun test: PASS (2048 tests pass, 0 fail)
- bash -n: N/A (no shell script changes)
Code Quality
- Version bump follows CLI versioning rules (patch bump for policy change)
- Clear inline documentation explaining the policy rationale
- Tests lock in the expected behavior to prevent regressions
-- security/pr-reviewer
* feat(cli): hermes web dashboard tunnel support
Hermes Agent v0.9.0 ships a local web dashboard (hermes dashboard, default
127.0.0.1:9119) for config / session / skill / gateway management. This wires
Hermes into spawn's existing SSH-tunnel infrastructure so `spawn run hermes`
auto-exposes the dashboard to the user's local browser.
- agent-setup.ts: new startHermesDashboard() helper — session-scoped
background launch via setsid/nohup with a port-ready wait loop. No systemd
(unlike OpenClaw's gateway) because the dashboard only needs to live for
the duration of the spawn session. Falls back gracefully if hermes isn't
in PATH or the dashboard fails to come up.
- Wire preLaunch, preLaunchMsg, and tunnel { remotePort: 9119 } into the
hermes AgentConfig. Mirrors the OpenClaw tunnel pattern at
orchestrate.ts:628 — startSshTunnel + openBrowser happen automatically.
- manifest.json: update hermes notes to mention the dashboard.
- hermes-dashboard.test.ts: 7 new unit tests verifying the deploy script
calls `hermes dashboard --port 9119 --host 127.0.0.1 --no-open`, checks
all three port-probe fallbacks (ss / /dev/tcp / nc), uses setsid+nohup,
waits for the port, and does NOT install a systemd unit.
- Bump cli version 1.0.6 -> 1.0.7.
Closes #3293
* chore: bump cli to 1.0.8 to leave 1.0.7 for #3296
---------
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
auto-install to same-major.minor bumps. The intent was "give users control
over feature updates" but the effect was "nobody installs security patches"
because the default became notice-only for everything.
This decouples the two ideas and aligns the policy with semver intent:
- PATCH bumps (1.0.5 -> 1.0.7, same major.minor): auto-install always,
no opt-in needed. Patches are reserved for bug fixes and security
hardening. Blast radius is bounded by semver: no behavior changes,
no new features, no breaking changes.
- MINOR / MAJOR bumps (1.0.x -> 1.1.0, 1.x.x -> 2.0.0): respect
SPAWN_AUTO_UPDATE=1 as opt-in. These can contain behavior changes
and users should decide when to move to them.
- SPAWN_NO_AUTO_UPDATE=1: new explicit opt-out for CI environments
or pinned installs that need a fully static CLI.
Caveat — the one-time hurdle: users currently on 1.0.6 won't get 1.0.7
automatically, because they're still running 1.0.6's update-check.ts
which honors the old opt-in gate. Once they reach 1.0.7 via spawn update
(or by setting SPAWN_AUTO_UPDATE=1), every future patch will propagate
automatically and the fleet becomes self-healing on security.
Tests:
- 5 new tests lock in the policy (patch auto without env, minor notice
without env, minor auto with env, major notice without env, explicit
opt-out suppresses patch)
- All 21 update-check tests pass (16 existing + 5 new)
- 2109/2109 total suite
Bumps 1.0.6 -> 1.0.7.
36b888c to
8364102
Compare
|
Rebased onto main to resolve the -- refactor/pr-maintainer |
louisgv
left a comment
There was a problem hiding this comment.
Security Review
Verdict: APPROVED
Commit: 8364102
Summary
This PR changes the auto-update policy to auto-install patch bumps (e.g., 1.0.6 → 1.0.9) without requiring SPAWN_AUTO_UPDATE=1, while keeping minor/major bumps as opt-in only. The PR rebased onto main to resolve a version conflict and bumped to v1.0.9.
Security Analysis
No security issues found. The change:
- Respects semver boundaries — patch versions are defined as bug fixes only, no behavior changes or breaking changes per semver spec
- Provides explicit opt-out —
SPAWN_NO_AUTO_UPDATE=1suppresses patch auto-install for users who need fully pinned CLI versions (CI environments) - No new injection vectors — uses existing
performAutoUpdate()which safely:- Fetches install script via
execFileSync("curl", [args])with array args (no shell interpolation) - Writes script to temp file and executes via
execFileSync("bash", [tmpFile])(no shell interpolation) - All arguments passed as array elements, not concatenated strings
- Fetches install script via
- Comprehensive test coverage — 5 new tests verify the policy matrix (patch auto-install, minor/major opt-in, opt-out)
- Version bump follows policy — v1.0.9 is a patch bump for this policy change
Tests
- bun test: PASS (2055 tests pass, 0 fail)
- bash -n: N/A (no shell script changes)
- curl|bash safety: N/A (no shell script changes)
- macOS compat: N/A (no shell script changes)
Code Quality
- Clear inline documentation explaining the policy rationale (lines 405-418)
- Tests lock in the expected behavior to prevent regressions
- Logic is straightforward:
shouldAutoInstall = !explicitOptOut && (patchOnly || explicitOptIn)
Approved and auto-merging.
-- security/pr-reviewer
Adds low-volume, high-signal product events on top of the existing
errors/warnings telemetry (shared/telemetry.ts). Answers "where do users
bail before reaching a running agent" at the fleet level.
Funnel events (in orchestrate.ts, both fast and sequential paths):
funnel_started pipeline begins
funnel_cloud_authed cloud.authenticate() ok
funnel_credentials_ready OR key + preProvision resolved
funnel_vm_ready VM booted and SSH-reachable
funnel_install_completed agent install succeeded (tarball or live)
funnel_configure_completed agent.configure() ran
funnel_prelaunch_completed gateway / dashboard / preLaunch hooks done
funnel_handoff about to launch TUI (final step)
Every event carries elapsed_ms since funnel_started, plus agent and cloud
via telemetry context. Per-step counts reveal the drop-off funnel in
PostHog without touching any PII.
Lifecycle events (new shared/lifecycle-telemetry.ts):
spawn_connected { spawn_id, agent, cloud, connect_count, date }
fired from list.ts when the user reconnects via the interactive picker.
Increments connection.metadata.connect_count and writes last_connected_at
so subsequent events and the eventual spawn_deleted have the total.
spawn_deleted { spawn_id, agent, cloud, lifetime_hours, connect_count, date }
fired from delete.ts (both interactive confirmAndDelete and headless
cmdDelete loop) after a successful cloud destroy. lifetime_hours is
computed from SpawnRecord.timestamp to now. Clamped at 0 for corrupt
clocks. connect_count is read from metadata.
New captureEvent(name, properties) helper in telemetry.ts:
- Respects SPAWN_TELEMETRY=0 opt-out (no new flag)
- Runs every string property through the existing scrubber (API keys,
GitHub tokens, bearer, emails, IPs, base64 blobs, home paths)
- Non-string values pass through untouched
Tests: 20 new (15 lifecycle-telemetry + 2 captureEvent + 3 assertion
additions to disabled-telemetry). Full suite: 2129/2129 pass.
Bumps 1.0.10 -> 1.0.11. Patch bump — auto-propagates under OpenRouterTeam#3296 policy.
…3305) * feat(telemetry): funnel + lifecycle events for onboarding drop-off Adds low-volume, high-signal product events on top of the existing errors/warnings telemetry (shared/telemetry.ts). Answers "where do users bail before reaching a running agent" at the fleet level. Funnel events (in orchestrate.ts, both fast and sequential paths): funnel_started pipeline begins funnel_cloud_authed cloud.authenticate() ok funnel_credentials_ready OR key + preProvision resolved funnel_vm_ready VM booted and SSH-reachable funnel_install_completed agent install succeeded (tarball or live) funnel_configure_completed agent.configure() ran funnel_prelaunch_completed gateway / dashboard / preLaunch hooks done funnel_handoff about to launch TUI (final step) Every event carries elapsed_ms since funnel_started, plus agent and cloud via telemetry context. Per-step counts reveal the drop-off funnel in PostHog without touching any PII. Lifecycle events (new shared/lifecycle-telemetry.ts): spawn_connected { spawn_id, agent, cloud, connect_count, date } fired from list.ts when the user reconnects via the interactive picker. Increments connection.metadata.connect_count and writes last_connected_at so subsequent events and the eventual spawn_deleted have the total. spawn_deleted { spawn_id, agent, cloud, lifetime_hours, connect_count, date } fired from delete.ts (both interactive confirmAndDelete and headless cmdDelete loop) after a successful cloud destroy. lifetime_hours is computed from SpawnRecord.timestamp to now. Clamped at 0 for corrupt clocks. connect_count is read from metadata. New captureEvent(name, properties) helper in telemetry.ts: - Respects SPAWN_TELEMETRY=0 opt-out (no new flag) - Runs every string property through the existing scrubber (API keys, GitHub tokens, bearer, emails, IPs, base64 blobs, home paths) - Non-string values pass through untouched Tests: 20 new (15 lifecycle-telemetry + 2 captureEvent + 3 assertion additions to disabled-telemetry). Full suite: 2129/2129 pass. Bumps 1.0.10 -> 1.0.11. Patch bump — auto-propagates under #3296 policy. * fix(test): replace mock.module with spyOn in lifecycle-telemetry tests mock.module contaminates the global module registry when running under --coverage, causing telemetry.test.ts and history-cov.test.ts to receive mocked implementations instead of the real modules. Switch to spyOn with mockRestore in afterEach so the real modules are preserved across files. Agent: pr-maintainer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: L <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Two bugs from the OpenRouterTeam#3305 rollout: 1. Test pollution: orchestrate.test.ts imports runOrchestration directly and never calls initTelemetry, but _enabled defaulted to true in the module so captureEvent happily fired real events at PostHog tagged agent=testagent. The onboarding funnel filled up with CI fixture data. 2. Funnel started too late: funnel_* events fired inside runOrchestration, which is only called AFTER the interactive picker completes. Users who bail at the agent/cloud/setup-options/name prompts were invisible — yet that's exactly where real drop-off happens. Fix 1 — telemetry.ts: - Default _enabled = false. Nothing fires until initTelemetry is explicitly called. Production (index.ts) calls it; tests that need telemetry (telemetry.test.ts) call it with BUN_ENV/NODE_ENV cleared. - Belt-and-suspenders: initTelemetry now short-circuits when BUN_ENV === "test" || NODE_ENV === "test", so even if future code calls it from a test context, events stay local. Fix 2 — picker instrumentation: New events fired before runOrchestration in every entry path: spawn_launched { mode: interactive | agent_interactive | direct | headless } menu_shown / menu_selected / menu_cancelled (only when user has prior spawns) agent_picker_shown agent_selected { agent } — also sets telemetry context cloud_picker_shown cloud_selected { cloud } — also sets telemetry context preflight_passed setup_options_shown setup_options_selected { step_count } name_prompt_shown name_entered picker_completed Wired into: commands/interactive.ts cmdInteractive + cmdAgentInteractive commands/run.ts cmdRun (direct `spawn <agent> <cloud>`) cmdRunHeadless (only spawn_launched) runOrchestration's existing funnel_* events continue to fire unchanged. The final funnel in PostHog: spawn_launched → agent_selected → cloud_selected → preflight_passed → setup_options_selected → name_entered → picker_completed → funnel_started → funnel_cloud_authed → funnel_credentials_ready → funnel_vm_ready → funnel_install_completed → funnel_configure_completed → funnel_prelaunch_completed → funnel_handoff Tests: - telemetry.test.ts: 2 new env-guard tests (BUN_ENV, NODE_ENV), plus updated beforeEach to clear both env vars so existing tests still exercise initTelemetry. - Full suite: 2131/2131 pass, biome 0 errors. Bumps 1.0.12 -> 1.0.13 (patch — auto-propagates under OpenRouterTeam#3296 policy).
Two bugs from the #3305 rollout: 1. Test pollution: orchestrate.test.ts imports runOrchestration directly and never calls initTelemetry, but _enabled defaulted to true in the module so captureEvent happily fired real events at PostHog tagged agent=testagent. The onboarding funnel filled up with CI fixture data. 2. Funnel started too late: funnel_* events fired inside runOrchestration, which is only called AFTER the interactive picker completes. Users who bail at the agent/cloud/setup-options/name prompts were invisible — yet that's exactly where real drop-off happens. Fix 1 — telemetry.ts: - Default _enabled = false. Nothing fires until initTelemetry is explicitly called. Production (index.ts) calls it; tests that need telemetry (telemetry.test.ts) call it with BUN_ENV/NODE_ENV cleared. - Belt-and-suspenders: initTelemetry now short-circuits when BUN_ENV === "test" || NODE_ENV === "test", so even if future code calls it from a test context, events stay local. Fix 2 — picker instrumentation: New events fired before runOrchestration in every entry path: spawn_launched { mode: interactive | agent_interactive | direct | headless } menu_shown / menu_selected / menu_cancelled (only when user has prior spawns) agent_picker_shown agent_selected { agent } — also sets telemetry context cloud_picker_shown cloud_selected { cloud } — also sets telemetry context preflight_passed setup_options_shown setup_options_selected { step_count } name_prompt_shown name_entered picker_completed Wired into: commands/interactive.ts cmdInteractive + cmdAgentInteractive commands/run.ts cmdRun (direct `spawn <agent> <cloud>`) cmdRunHeadless (only spawn_launched) runOrchestration's existing funnel_* events continue to fire unchanged. The final funnel in PostHog: spawn_launched → agent_selected → cloud_selected → preflight_passed → setup_options_selected → name_entered → picker_completed → funnel_started → funnel_cloud_authed → funnel_credentials_ready → funnel_vm_ready → funnel_install_completed → funnel_configure_completed → funnel_prelaunch_completed → funnel_handoff Tests: - telemetry.test.ts: 2 new env-guard tests (BUN_ENV, NODE_ENV), plus updated beforeEach to clear both env vars so existing tests still exercise initTelemetry. - Full suite: 2131/2131 pass, biome 0 errors. Bumps 1.0.12 -> 1.0.13 (patch — auto-propagates under #3296 policy).
Two bugs from the OpenRouterTeam#3305 rollout: 1. Test pollution: orchestrate.test.ts imports runOrchestration directly and never calls initTelemetry, but _enabled defaulted to true in the module so captureEvent happily fired real events at PostHog tagged agent=testagent. The onboarding funnel filled up with CI fixture data. 2. Funnel started too late: funnel_* events fired inside runOrchestration, which is only called AFTER the interactive picker completes. Users who bail at the agent/cloud/setup-options/name prompts were invisible — yet that's exactly where real drop-off happens. Fix 1 — telemetry.ts: - Default _enabled = false. Nothing fires until initTelemetry is explicitly called. Production (index.ts) calls it; tests that need telemetry (telemetry.test.ts) call it with BUN_ENV/NODE_ENV cleared. - Belt-and-suspenders: initTelemetry now short-circuits when BUN_ENV === "test" || NODE_ENV === "test", so even if future code calls it from a test context, events stay local. Fix 2 — picker instrumentation: New events fired before runOrchestration in every entry path: spawn_launched { mode: interactive | agent_interactive | direct | headless } menu_shown / menu_selected / menu_cancelled (only when user has prior spawns) agent_picker_shown agent_selected { agent } — also sets telemetry context cloud_picker_shown cloud_selected { cloud } — also sets telemetry context preflight_passed setup_options_shown setup_options_selected { step_count } name_prompt_shown name_entered picker_completed Wired into: commands/interactive.ts cmdInteractive + cmdAgentInteractive commands/run.ts cmdRun (direct `spawn <agent> <cloud>`) cmdRunHeadless (only spawn_launched) runOrchestration's existing funnel_* events continue to fire unchanged. The final funnel in PostHog: spawn_launched → agent_selected → cloud_selected → preflight_passed → setup_options_selected → name_entered → picker_completed → funnel_started → funnel_cloud_authed → funnel_credentials_ready → funnel_vm_ready → funnel_install_completed → funnel_configure_completed → funnel_prelaunch_completed → funnel_handoff Tests: - telemetry.test.ts: 2 new env-guard tests (BUN_ENV, NODE_ENV), plus updated beforeEach to clear both env vars so existing tests still exercise initTelemetry. - Full suite: 2131/2131 pass, biome 0 errors. Bumps 1.0.12 -> 1.0.13 (patch — auto-propagates under OpenRouterTeam#3296 policy).
Summary
Fixes a regression from #3254. That PR flipped auto-update to opt-in AND locked it to patch-only. Intent was "give users control"; effect was "nobody gets security patches." This decouples the two ideas and aligns the policy with semver intent.
New policy
Rationale: patches are for bugs and security hardening. Their blast radius is bounded by semver — no behavior changes, no new features, no breaking changes. Users benefit from getting them without having to know a CLI env var exists. Feature releases (minor/major) still respect opt-in, so #3254's original UX goal is preserved.
New `SPAWN_NO_AUTO_UPDATE=1` explicit opt-out is added for CI environments or pinned installs that need a fully static CLI.
Why this needs to ship
Today's work includes two PRs with security hardening (#3294) and a new feature (#3295). Under the current policy, neither will reach most of the user base automatically because the default since 2026-04-10 has been notice-only. The fleet is frozen on v1.0.6 (or earlier) until users proactively run `spawn update`, which most won't.
This PR fixes the long-term propagation — from v1.0.7 forward, every future patch auto-installs without user intervention.
The one-time hurdle (known limitation)
Users currently on v1.0.6 are running v1.0.6's `update-check.ts`, which still honors the old opt-in gate. They won't get v1.0.7 automatically. Once they do reach v1.0.7 (via `spawn update` or `SPAWN_AUTO_UPDATE=1`), every subsequent patch propagates automatically and they're self-healing forever.
For the one-time catch-up, we need out-of-band notification: Slack announcement, email, whatever channel reaches existing users. The CLI cannot reach users who aren't running it.
Changes
Test plan
Coordination note
This PR bumps to v1.0.7. PR #3295 (Hermes dashboard tunnel) also bumps to v1.0.7. Merging this one first is recommended — I'll force-push #3295 to bump to v1.0.8 on top, so the version numbers line up cleanly.