Skip to content

fix(update-check): auto-install patch bumps without SPAWN_AUTO_UPDATE#3296

Merged
louisgv merged 1 commit intoOpenRouterTeam:mainfrom
AhmedTMM:fix/auto-update-patches
Apr 14, 2026
Merged

fix(update-check): auto-install patch bumps without SPAWN_AUTO_UPDATE#3296
louisgv merged 1 commit intoOpenRouterTeam:mainfrom
AhmedTMM:fix/auto-update-patches

Conversation

@AhmedTMM
Copy link
Copy Markdown
Collaborator

Summary

Fixes a regression from #3254. That PR flipped auto-update to opt-in AND locked it to patch-only. Intent was "give users control"; effect was "nobody gets security patches." This decouples the two ideas and aligns the policy with semver intent.

New policy

Bump Example Behavior
Patch (same major.minor) 1.0.5 → 1.0.7 Auto-install — no opt-in needed
Minor 1.0.x → 1.1.0 Notice only, unless `SPAWN_AUTO_UPDATE=1`
Major 1.x.x → 2.0.0 Notice only, unless `SPAWN_AUTO_UPDATE=1`
Any (opt-out) `SPAWN_NO_AUTO_UPDATE=1` Notice only, regardless of bump type

Rationale: patches are for bugs and security hardening. Their blast radius is bounded by semver — no behavior changes, no new features, no breaking changes. Users benefit from getting them without having to know a CLI env var exists. Feature releases (minor/major) still respect opt-in, so #3254's original UX goal is preserved.

New `SPAWN_NO_AUTO_UPDATE=1` explicit opt-out is added for CI environments or pinned installs that need a fully static CLI.

Why this needs to ship

Today's work includes two PRs with security hardening (#3294) and a new feature (#3295). Under the current policy, neither will reach most of the user base automatically because the default since 2026-04-10 has been notice-only. The fleet is frozen on v1.0.6 (or earlier) until users proactively run `spawn update`, which most won't.

This PR fixes the long-term propagation — from v1.0.7 forward, every future patch auto-installs without user intervention.

The one-time hurdle (known limitation)

Users currently on v1.0.6 are running v1.0.6's `update-check.ts`, which still honors the old opt-in gate. They won't get v1.0.7 automatically. Once they do reach v1.0.7 (via `spawn update` or `SPAWN_AUTO_UPDATE=1`), every subsequent patch propagates automatically and they're self-healing forever.

For the one-time catch-up, we need out-of-band notification: Slack announcement, email, whatever channel reaches existing users. The CLI cannot reach users who aren't running it.

Changes

  • `update-check.ts` — `checkForUpdates` gates `performAutoUpdate` on `!explicitOptOut && (patchOnly || explicitOptIn)`.
  • `update-check.test.ts` — 5 new tests lock in the policy:
    • Patch bump auto-installs without `SPAWN_AUTO_UPDATE=1`
    • Minor bump shows notice only without opt-in
    • Major bump shows notice only without opt-in
    • Minor bump auto-installs WITH opt-in
    • `SPAWN_NO_AUTO_UPDATE=1` suppresses patch auto-install
  • `package.json` — bump 1.0.6 → 1.0.7.

Test plan

  • `bunx biome check src/` — 0 errors on 184 files
  • `bun test src/tests/update-check.test.ts` — 21/21 pass (16 existing + 5 new)
  • `bun test` — 2109/2109 pass full suite
  • Manual: on a v1.0.6 install with `SPAWN_AUTO_UPDATE=1`, run spawn and verify it auto-updates to 1.0.7
  • Manual: on fresh v1.0.7 (post-merge), verify a contrived v1.0.99 endpoint auto-installs
  • Manual: with `SPAWN_NO_AUTO_UPDATE=1` set, verify even patches just show the notice

Coordination note

This PR bumps to v1.0.7. PR #3295 (Hermes dashboard tunnel) also bumps to v1.0.7. Merging this one first is recommended — I'll force-push #3295 to bump to v1.0.8 on top, so the version numbers line up cleanly.

AhmedTMM added a commit to AhmedTMM/spawn that referenced this pull request Apr 14, 2026
louisgv
louisgv previously approved these changes Apr 14, 2026
Copy link
Copy Markdown
Member

@louisgv louisgv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Review

Verdict: APPROVED

Commit: 36b888c

Summary

This PR changes the auto-update policy to auto-install patch bumps (e.g., 1.0.6 → 1.0.7) without requiring SPAWN_AUTO_UPDATE=1, while keeping minor/major bumps as opt-in only.

Security Analysis

No security issues found. The change:

  1. Respects semver boundaries — patch versions are defined as bug fixes only, no behavior changes or breaking changes per semver spec
  2. Provides explicit opt-outSPAWN_NO_AUTO_UPDATE=1 suppresses patch auto-install for users who need fully pinned CLI versions (CI environments)
  3. No new injection vectors — uses existing performAutoUpdate() which safely calls execFileSync with array args (not shell interpolation)
  4. Comprehensive test coverage — 5 new tests verify the policy matrix (patch auto-install, minor/major opt-in, opt-out)

Tests

  • bun test: PASS (2048 tests pass, 0 fail)
  • bash -n: N/A (no shell script changes)

Code Quality

  • Version bump follows CLI versioning rules (patch bump for policy change)
  • Clear inline documentation explaining the policy rationale
  • Tests lock in the expected behavior to prevent regressions

-- security/pr-reviewer

@louisgv louisgv added the security-approved Security review approved label Apr 14, 2026
louisgv added a commit that referenced this pull request Apr 14, 2026
* feat(cli): hermes web dashboard tunnel support

Hermes Agent v0.9.0 ships a local web dashboard (hermes dashboard, default
127.0.0.1:9119) for config / session / skill / gateway management. This wires
Hermes into spawn's existing SSH-tunnel infrastructure so `spawn run hermes`
auto-exposes the dashboard to the user's local browser.

- agent-setup.ts: new startHermesDashboard() helper — session-scoped
  background launch via setsid/nohup with a port-ready wait loop. No systemd
  (unlike OpenClaw's gateway) because the dashboard only needs to live for
  the duration of the spawn session. Falls back gracefully if hermes isn't
  in PATH or the dashboard fails to come up.
- Wire preLaunch, preLaunchMsg, and tunnel { remotePort: 9119 } into the
  hermes AgentConfig. Mirrors the OpenClaw tunnel pattern at
  orchestrate.ts:628 — startSshTunnel + openBrowser happen automatically.
- manifest.json: update hermes notes to mention the dashboard.
- hermes-dashboard.test.ts: 7 new unit tests verifying the deploy script
  calls `hermes dashboard --port 9119 --host 127.0.0.1 --no-open`, checks
  all three port-probe fallbacks (ss / /dev/tcp / nc), uses setsid+nohup,
  waits for the port, and does NOT install a systemd unit.
- Bump cli version 1.0.6 -> 1.0.7.

Closes #3293

* chore: bump cli to 1.0.8 to leave 1.0.7 for #3296

---------

Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
auto-install to same-major.minor bumps. The intent was "give users control
over feature updates" but the effect was "nobody installs security patches"
because the default became notice-only for everything.

This decouples the two ideas and aligns the policy with semver intent:

  - PATCH bumps (1.0.5 -> 1.0.7, same major.minor): auto-install always,
    no opt-in needed. Patches are reserved for bug fixes and security
    hardening. Blast radius is bounded by semver: no behavior changes,
    no new features, no breaking changes.

  - MINOR / MAJOR bumps (1.0.x -> 1.1.0, 1.x.x -> 2.0.0): respect
    SPAWN_AUTO_UPDATE=1 as opt-in. These can contain behavior changes
    and users should decide when to move to them.

  - SPAWN_NO_AUTO_UPDATE=1: new explicit opt-out for CI environments
    or pinned installs that need a fully static CLI.

Caveat — the one-time hurdle: users currently on 1.0.6 won't get 1.0.7
automatically, because they're still running 1.0.6's update-check.ts
which honors the old opt-in gate. Once they reach 1.0.7 via spawn update
(or by setting SPAWN_AUTO_UPDATE=1), every future patch will propagate
automatically and the fleet becomes self-healing on security.

Tests:
- 5 new tests lock in the policy (patch auto without env, minor notice
  without env, minor auto with env, major notice without env, explicit
  opt-out suppresses patch)
- All 21 update-check tests pass (16 existing + 5 new)
- 2109/2109 total suite

Bumps 1.0.6 -> 1.0.7.
@la14-1
Copy link
Copy Markdown
Member

la14-1 commented Apr 14, 2026

Rebased onto main to resolve the package.json version conflict (v1.0.7 vs v1.0.8 on main). Bumped version to v1.0.9. All CI checks pass. Needs re-approval since the force-push invalidated the previous review.

-- refactor/pr-maintainer

Copy link
Copy Markdown
Member

@louisgv louisgv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Review

Verdict: APPROVED

Commit: 8364102

Summary

This PR changes the auto-update policy to auto-install patch bumps (e.g., 1.0.6 → 1.0.9) without requiring SPAWN_AUTO_UPDATE=1, while keeping minor/major bumps as opt-in only. The PR rebased onto main to resolve a version conflict and bumped to v1.0.9.

Security Analysis

No security issues found. The change:

  1. Respects semver boundaries — patch versions are defined as bug fixes only, no behavior changes or breaking changes per semver spec
  2. Provides explicit opt-outSPAWN_NO_AUTO_UPDATE=1 suppresses patch auto-install for users who need fully pinned CLI versions (CI environments)
  3. No new injection vectors — uses existing performAutoUpdate() which safely:
    • Fetches install script via execFileSync("curl", [args]) with array args (no shell interpolation)
    • Writes script to temp file and executes via execFileSync("bash", [tmpFile]) (no shell interpolation)
    • All arguments passed as array elements, not concatenated strings
  4. Comprehensive test coverage — 5 new tests verify the policy matrix (patch auto-install, minor/major opt-in, opt-out)
  5. Version bump follows policy — v1.0.9 is a patch bump for this policy change

Tests

  • bun test: PASS (2055 tests pass, 0 fail)
  • bash -n: N/A (no shell script changes)
  • curl|bash safety: N/A (no shell script changes)
  • macOS compat: N/A (no shell script changes)

Code Quality

  • Clear inline documentation explaining the policy rationale (lines 405-418)
  • Tests lock in the expected behavior to prevent regressions
  • Logic is straightforward: shouldAutoInstall = !explicitOptOut && (patchOnly || explicitOptIn)

Approved and auto-merging.


-- security/pr-reviewer

@louisgv louisgv merged commit 655a909 into OpenRouterTeam:main Apr 14, 2026
5 checks passed
@AhmedTMM AhmedTMM deleted the fix/auto-update-patches branch April 14, 2026 23:41
AhmedTMM added a commit to AhmedTMM/spawn that referenced this pull request Apr 14, 2026
Adds low-volume, high-signal product events on top of the existing
errors/warnings telemetry (shared/telemetry.ts). Answers "where do users
bail before reaching a running agent" at the fleet level.

Funnel events (in orchestrate.ts, both fast and sequential paths):

  funnel_started              pipeline begins
  funnel_cloud_authed         cloud.authenticate() ok
  funnel_credentials_ready    OR key + preProvision resolved
  funnel_vm_ready             VM booted and SSH-reachable
  funnel_install_completed    agent install succeeded (tarball or live)
  funnel_configure_completed  agent.configure() ran
  funnel_prelaunch_completed  gateway / dashboard / preLaunch hooks done
  funnel_handoff              about to launch TUI (final step)

Every event carries elapsed_ms since funnel_started, plus agent and cloud
via telemetry context. Per-step counts reveal the drop-off funnel in
PostHog without touching any PII.

Lifecycle events (new shared/lifecycle-telemetry.ts):

  spawn_connected  { spawn_id, agent, cloud, connect_count, date }
    fired from list.ts when the user reconnects via the interactive picker.
    Increments connection.metadata.connect_count and writes last_connected_at
    so subsequent events and the eventual spawn_deleted have the total.

  spawn_deleted    { spawn_id, agent, cloud, lifetime_hours, connect_count, date }
    fired from delete.ts (both interactive confirmAndDelete and headless
    cmdDelete loop) after a successful cloud destroy. lifetime_hours is
    computed from SpawnRecord.timestamp to now. Clamped at 0 for corrupt
    clocks. connect_count is read from metadata.

New captureEvent(name, properties) helper in telemetry.ts:
- Respects SPAWN_TELEMETRY=0 opt-out (no new flag)
- Runs every string property through the existing scrubber (API keys,
  GitHub tokens, bearer, emails, IPs, base64 blobs, home paths)
- Non-string values pass through untouched

Tests: 20 new (15 lifecycle-telemetry + 2 captureEvent + 3 assertion
additions to disabled-telemetry). Full suite: 2129/2129 pass.

Bumps 1.0.10 -> 1.0.11. Patch bump — auto-propagates under OpenRouterTeam#3296 policy.
louisgv added a commit that referenced this pull request Apr 15, 2026
…3305)

* feat(telemetry): funnel + lifecycle events for onboarding drop-off

Adds low-volume, high-signal product events on top of the existing
errors/warnings telemetry (shared/telemetry.ts). Answers "where do users
bail before reaching a running agent" at the fleet level.

Funnel events (in orchestrate.ts, both fast and sequential paths):

  funnel_started              pipeline begins
  funnel_cloud_authed         cloud.authenticate() ok
  funnel_credentials_ready    OR key + preProvision resolved
  funnel_vm_ready             VM booted and SSH-reachable
  funnel_install_completed    agent install succeeded (tarball or live)
  funnel_configure_completed  agent.configure() ran
  funnel_prelaunch_completed  gateway / dashboard / preLaunch hooks done
  funnel_handoff              about to launch TUI (final step)

Every event carries elapsed_ms since funnel_started, plus agent and cloud
via telemetry context. Per-step counts reveal the drop-off funnel in
PostHog without touching any PII.

Lifecycle events (new shared/lifecycle-telemetry.ts):

  spawn_connected  { spawn_id, agent, cloud, connect_count, date }
    fired from list.ts when the user reconnects via the interactive picker.
    Increments connection.metadata.connect_count and writes last_connected_at
    so subsequent events and the eventual spawn_deleted have the total.

  spawn_deleted    { spawn_id, agent, cloud, lifetime_hours, connect_count, date }
    fired from delete.ts (both interactive confirmAndDelete and headless
    cmdDelete loop) after a successful cloud destroy. lifetime_hours is
    computed from SpawnRecord.timestamp to now. Clamped at 0 for corrupt
    clocks. connect_count is read from metadata.

New captureEvent(name, properties) helper in telemetry.ts:
- Respects SPAWN_TELEMETRY=0 opt-out (no new flag)
- Runs every string property through the existing scrubber (API keys,
  GitHub tokens, bearer, emails, IPs, base64 blobs, home paths)
- Non-string values pass through untouched

Tests: 20 new (15 lifecycle-telemetry + 2 captureEvent + 3 assertion
additions to disabled-telemetry). Full suite: 2129/2129 pass.

Bumps 1.0.10 -> 1.0.11. Patch bump — auto-propagates under #3296 policy.

* fix(test): replace mock.module with spyOn in lifecycle-telemetry tests

mock.module contaminates the global module registry when running under
--coverage, causing telemetry.test.ts and history-cov.test.ts to receive
mocked implementations instead of the real modules. Switch to spyOn with
mockRestore in afterEach so the real modules are preserved across files.

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
AhmedTMM added a commit to AhmedTMM/spawn that referenced this pull request Apr 15, 2026
Two bugs from the OpenRouterTeam#3305 rollout:

1. Test pollution: orchestrate.test.ts imports runOrchestration directly
   and never calls initTelemetry, but _enabled defaulted to true in the
   module so captureEvent happily fired real events at PostHog tagged
   agent=testagent. The onboarding funnel filled up with CI fixture data.

2. Funnel started too late: funnel_* events fired inside runOrchestration,
   which is only called AFTER the interactive picker completes. Users who
   bail at the agent/cloud/setup-options/name prompts were invisible —
   yet that's exactly where real drop-off happens.

Fix 1 — telemetry.ts:
  - Default _enabled = false. Nothing fires until initTelemetry is
    explicitly called. Production (index.ts) calls it; tests that need
    telemetry (telemetry.test.ts) call it with BUN_ENV/NODE_ENV cleared.
  - Belt-and-suspenders: initTelemetry now short-circuits when
    BUN_ENV === "test" || NODE_ENV === "test", so even if future code
    calls it from a test context, events stay local.

Fix 2 — picker instrumentation:
  New events fired before runOrchestration in every entry path:

    spawn_launched         { mode: interactive | agent_interactive | direct | headless }
    menu_shown / menu_selected / menu_cancelled   (only when user has prior spawns)
    agent_picker_shown
    agent_selected         { agent }     — also sets telemetry context
    cloud_picker_shown
    cloud_selected         { cloud }     — also sets telemetry context
    preflight_passed
    setup_options_shown
    setup_options_selected { step_count }
    name_prompt_shown
    name_entered
    picker_completed

  Wired into:
    commands/interactive.ts  cmdInteractive + cmdAgentInteractive
    commands/run.ts          cmdRun (direct `spawn <agent> <cloud>`)
                             cmdRunHeadless (only spawn_launched)

  runOrchestration's existing funnel_* events continue to fire unchanged.
  The final funnel in PostHog:
    spawn_launched → agent_selected → cloud_selected → preflight_passed
    → setup_options_selected → name_entered → picker_completed
    → funnel_started → funnel_cloud_authed → funnel_credentials_ready
    → funnel_vm_ready → funnel_install_completed → funnel_configure_completed
    → funnel_prelaunch_completed → funnel_handoff

Tests:
- telemetry.test.ts: 2 new env-guard tests (BUN_ENV, NODE_ENV), plus
  updated beforeEach to clear both env vars so existing tests still
  exercise initTelemetry.
- Full suite: 2131/2131 pass, biome 0 errors.

Bumps 1.0.12 -> 1.0.13 (patch — auto-propagates under OpenRouterTeam#3296 policy).
louisgv pushed a commit that referenced this pull request Apr 15, 2026
Two bugs from the #3305 rollout:

1. Test pollution: orchestrate.test.ts imports runOrchestration directly
   and never calls initTelemetry, but _enabled defaulted to true in the
   module so captureEvent happily fired real events at PostHog tagged
   agent=testagent. The onboarding funnel filled up with CI fixture data.

2. Funnel started too late: funnel_* events fired inside runOrchestration,
   which is only called AFTER the interactive picker completes. Users who
   bail at the agent/cloud/setup-options/name prompts were invisible —
   yet that's exactly where real drop-off happens.

Fix 1 — telemetry.ts:
  - Default _enabled = false. Nothing fires until initTelemetry is
    explicitly called. Production (index.ts) calls it; tests that need
    telemetry (telemetry.test.ts) call it with BUN_ENV/NODE_ENV cleared.
  - Belt-and-suspenders: initTelemetry now short-circuits when
    BUN_ENV === "test" || NODE_ENV === "test", so even if future code
    calls it from a test context, events stay local.

Fix 2 — picker instrumentation:
  New events fired before runOrchestration in every entry path:

    spawn_launched         { mode: interactive | agent_interactive | direct | headless }
    menu_shown / menu_selected / menu_cancelled   (only when user has prior spawns)
    agent_picker_shown
    agent_selected         { agent }     — also sets telemetry context
    cloud_picker_shown
    cloud_selected         { cloud }     — also sets telemetry context
    preflight_passed
    setup_options_shown
    setup_options_selected { step_count }
    name_prompt_shown
    name_entered
    picker_completed

  Wired into:
    commands/interactive.ts  cmdInteractive + cmdAgentInteractive
    commands/run.ts          cmdRun (direct `spawn <agent> <cloud>`)
                             cmdRunHeadless (only spawn_launched)

  runOrchestration's existing funnel_* events continue to fire unchanged.
  The final funnel in PostHog:
    spawn_launched → agent_selected → cloud_selected → preflight_passed
    → setup_options_selected → name_entered → picker_completed
    → funnel_started → funnel_cloud_authed → funnel_credentials_ready
    → funnel_vm_ready → funnel_install_completed → funnel_configure_completed
    → funnel_prelaunch_completed → funnel_handoff

Tests:
- telemetry.test.ts: 2 new env-guard tests (BUN_ENV, NODE_ENV), plus
  updated beforeEach to clear both env vars so existing tests still
  exercise initTelemetry.
- Full suite: 2131/2131 pass, biome 0 errors.

Bumps 1.0.12 -> 1.0.13 (patch — auto-propagates under #3296 policy).
AhmedTMM added a commit to AhmedTMM/spawn that referenced this pull request Apr 16, 2026
Two bugs from the OpenRouterTeam#3305 rollout:

1. Test pollution: orchestrate.test.ts imports runOrchestration directly
   and never calls initTelemetry, but _enabled defaulted to true in the
   module so captureEvent happily fired real events at PostHog tagged
   agent=testagent. The onboarding funnel filled up with CI fixture data.

2. Funnel started too late: funnel_* events fired inside runOrchestration,
   which is only called AFTER the interactive picker completes. Users who
   bail at the agent/cloud/setup-options/name prompts were invisible —
   yet that's exactly where real drop-off happens.

Fix 1 — telemetry.ts:
  - Default _enabled = false. Nothing fires until initTelemetry is
    explicitly called. Production (index.ts) calls it; tests that need
    telemetry (telemetry.test.ts) call it with BUN_ENV/NODE_ENV cleared.
  - Belt-and-suspenders: initTelemetry now short-circuits when
    BUN_ENV === "test" || NODE_ENV === "test", so even if future code
    calls it from a test context, events stay local.

Fix 2 — picker instrumentation:
  New events fired before runOrchestration in every entry path:

    spawn_launched         { mode: interactive | agent_interactive | direct | headless }
    menu_shown / menu_selected / menu_cancelled   (only when user has prior spawns)
    agent_picker_shown
    agent_selected         { agent }     — also sets telemetry context
    cloud_picker_shown
    cloud_selected         { cloud }     — also sets telemetry context
    preflight_passed
    setup_options_shown
    setup_options_selected { step_count }
    name_prompt_shown
    name_entered
    picker_completed

  Wired into:
    commands/interactive.ts  cmdInteractive + cmdAgentInteractive
    commands/run.ts          cmdRun (direct `spawn <agent> <cloud>`)
                             cmdRunHeadless (only spawn_launched)

  runOrchestration's existing funnel_* events continue to fire unchanged.
  The final funnel in PostHog:
    spawn_launched → agent_selected → cloud_selected → preflight_passed
    → setup_options_selected → name_entered → picker_completed
    → funnel_started → funnel_cloud_authed → funnel_credentials_ready
    → funnel_vm_ready → funnel_install_completed → funnel_configure_completed
    → funnel_prelaunch_completed → funnel_handoff

Tests:
- telemetry.test.ts: 2 new env-guard tests (BUN_ENV, NODE_ENV), plus
  updated beforeEach to clear both env vars so existing tests still
  exercise initTelemetry.
- Full suite: 2131/2131 pass, biome 0 errors.

Bumps 1.0.12 -> 1.0.13 (patch — auto-propagates under OpenRouterTeam#3296 policy).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

security-approved Security review approved

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants