feat(telemetry): funnel + lifecycle events for onboarding drop-off by AhmedTMM · Pull Request #3305 · OpenRouterTeam/spawn

AhmedTMM · 2026-04-14T23:52:36Z

Summary

Adds low-volume, high-signal product events on top of the existing errors/warnings telemetry so we can answer "where do users bail before reaching a running agent" at the fleet level, plus track spawn lifetime and login patterns.

Respects existing `SPAWN_TELEMETRY=0` opt-out — no new flags.

Funnel events (in `orchestrate.ts`, both fast and sequential paths)

Event	Fires when
`funnel_started`	Pipeline begins
`funnel_cloud_authed`	`cloud.authenticate()` ok
`funnel_credentials_ready`	OR key + preProvision resolved
`funnel_vm_ready`	VM booted and SSH-reachable
`funnel_install_completed`	Agent install succeeded (tarball or live)
`funnel_configure_completed`	`agent.configure()` ran
`funnel_prelaunch_completed`	Gateway / dashboard / preLaunch hooks done
`funnel_handoff`	About to launch TUI (final step)

Every event carries `elapsed_ms` since `funnel_started`, plus `agent` and `cloud` via telemetry context. Per-step counts in PostHog reveal the exact drop-off funnel without PII.

Lifecycle events (new `shared/lifecycle-telemetry.ts`)

`spawn_connected` — fired from `list.ts` when the user reconnects via the interactive picker. Properties: `spawn_id`, `agent`, `cloud`, `connect_count`, `date`. Increments `connection.metadata.connect_count` and writes `last_connected_at` so subsequent events (and the eventual `spawn_deleted`) have the running total.

`spawn_deleted` — fired from `delete.ts` (both interactive `confirmAndDelete` and headless `cmdDelete` loop) after a successful cloud destroy. Properties: `spawn_id`, `agent`, `cloud`, `lifetime_hours`, `connect_count`, `date`. `lifetime_hours` is computed from `SpawnRecord.timestamp` to now and clamped at 0 for corrupt clocks.

Answers: how long does a typical spawn live, how many times do users reconnect to it, which agents/clouds get the most re-use.

Privacy + scrubbing

New `captureEvent(name, properties)` helper in `telemetry.ts`:

Gates on `SPAWN_TELEMETRY=0` (no new flag)
Runs every string property through the existing scrubber (API keys, GitHub tokens, bearer, emails, IPs, base64 blobs, home paths)
Non-string values (numbers, booleans, `spawn_id` UUIDs) pass through untouched

Nothing in the funnel events is user-typed — they're all known-at-compile-time agent/cloud names plus timing integers.

Persistence model for `connect_count`

Stored inside `SpawnRecord.connection.metadata` as a stringified integer (the existing metadata schema is `Record<string, string>`). `saveMetadata` merges — no risk of clobbering other keys like `tunnel_remote_port`.

Tests

`lifecycle-telemetry.test.ts` (15 new tests) — locks in the connect-count math, lifetime computation, no-op for missing records, event payload shape, and tolerance for malformed metadata.
`telemetry.test.ts` (+2 tests for `captureEvent`, +1 assertion in disabled-telemetry) — verifies the new helper emits batched events with the right shape, respects opt-out, and scrubs string values but passes non-strings through.
Full suite: 2129/2129 pass, biome 187 files 0 errors.

Not doing in this PR

Failure events (e.g. `funnel_provision_failed`) — existing `captureError` already handles errors with stack traces. Funnel drop-off is inferable from the absence of the next step (e.g. `funnel_credentials_ready` count − `funnel_vm_ready` count = VM provisioning drop-off).
Retry tracking — each retryOrQuit loop already fires `captureError` for the underlying failure. A separate retry-counter event would add noise for marginal signal.
Post-handoff tracking — once the TUI takes over, we're out of the CLI. In-agent session tracking is out of scope; that's the agent's responsibility.

Version

Bumps 1.0.10 → 1.0.11. Patch bump — auto-propagates under #3296's new policy, so the telemetry will start flowing to users on their next spawn run without any manual update.

Adds low-volume, high-signal product events on top of the existing errors/warnings telemetry (shared/telemetry.ts). Answers "where do users bail before reaching a running agent" at the fleet level. Funnel events (in orchestrate.ts, both fast and sequential paths): funnel_started pipeline begins funnel_cloud_authed cloud.authenticate() ok funnel_credentials_ready OR key + preProvision resolved funnel_vm_ready VM booted and SSH-reachable funnel_install_completed agent install succeeded (tarball or live) funnel_configure_completed agent.configure() ran funnel_prelaunch_completed gateway / dashboard / preLaunch hooks done funnel_handoff about to launch TUI (final step) Every event carries elapsed_ms since funnel_started, plus agent and cloud via telemetry context. Per-step counts reveal the drop-off funnel in PostHog without touching any PII. Lifecycle events (new shared/lifecycle-telemetry.ts): spawn_connected { spawn_id, agent, cloud, connect_count, date } fired from list.ts when the user reconnects via the interactive picker. Increments connection.metadata.connect_count and writes last_connected_at so subsequent events and the eventual spawn_deleted have the total. spawn_deleted { spawn_id, agent, cloud, lifetime_hours, connect_count, date } fired from delete.ts (both interactive confirmAndDelete and headless cmdDelete loop) after a successful cloud destroy. lifetime_hours is computed from SpawnRecord.timestamp to now. Clamped at 0 for corrupt clocks. connect_count is read from metadata. New captureEvent(name, properties) helper in telemetry.ts: - Respects SPAWN_TELEMETRY=0 opt-out (no new flag) - Runs every string property through the existing scrubber (API keys, GitHub tokens, bearer, emails, IPs, base64 blobs, home paths) - Non-string values pass through untouched Tests: 20 new (15 lifecycle-telemetry + 2 captureEvent + 3 assertion additions to disabled-telemetry). Full suite: 2129/2129 pass. Bumps 1.0.10 -> 1.0.11. Patch bump — auto-propagates under OpenRouterTeam#3296 policy.

louisgv

Security Review

Verdict: APPROVED
Commit: f14e502

Summary

This PR adds funnel telemetry and lifecycle event tracking for onboarding analytics. All changes respect the existing SPAWN_TELEMETRY=0 opt-out mechanism.

Security Analysis

Telemetry Infrastructure

✅ PII scrubbing: All string values in captureEvent() are passed through the same scrubber as errors/warnings (lines 196-205 in telemetry.ts)
✅ Sensitive pattern redaction: API keys, GitHub tokens, emails, IPs, file paths are all redacted before upload (lines 14-58 in telemetry.ts)
✅ Opt-out respected: All new events use captureEvent() which checks _enabled flag (controlled by SPAWN_TELEMETRY=0)
✅ No command args: Only aggregated metrics (elapsed_ms, connect_count, lifetime_hours) are sent - no user input, file paths, or command arguments

Lifecycle Tracking

✅ Safe metadata storage: connect_count and last_connected_at stored in existing SpawnRecord.connection.metadata as strings
✅ No credential leakage: Only spawn_id (random UUID), agent/cloud names, and numeric metrics are sent
✅ Proper event timing: trackSpawnDeleted() called AFTER successful deletion, not before (prevents false positives)

Funnel Tracking

✅ Context isolation: Agent/cloud set via setTelemetryContext() and attached to all events automatically
✅ No session tracking: Only pipeline step completion events - no keystroke tracking or prompt content
✅ Safe timing calculation: Uses module-scoped _funnelStart timestamp - no external state manipulation

Tests

✅ bash -n: N/A (no shell scripts modified)
✅ bun test: PASS (2068 tests, 0 failures)
✅ biome lint: PASS (187 files checked, no issues)
✅ Test coverage: New test file lifecycle-telemetry.test.ts with comprehensive coverage of both tracking functions

Findings

None. Code is secure.

-- security/pr-reviewer

mock.module contaminates the global module registry when running under --coverage, causing telemetry.test.ts and history-cov.test.ts to receive mocked implementations instead of the real modules. Switch to spyOn with mockRestore in afterEach so the real modules are preserved across files. Agent: pr-maintainer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

la14-1 · 2026-04-15T04:28:31Z

Pushed a fix for the 5 failing Mock Tests: mock.module in lifecycle-telemetry.test.ts was contaminating the global module registry when running under --coverage, causing telemetry.test.ts (2 failures) and history-cov.test.ts (3 failures) to receive mocked implementations instead of the real modules.

Replaced mock.module with spyOn + mockRestore in afterEach, which scopes the mocks to each test without polluting other files. Full suite passes: 2129/2129, biome clean.

-- refactor/pr-maintainer

louisgv

Security Review

Verdict: APPROVED
Commit: 42f4426

Summary

This PR adds funnel telemetry and lifecycle event tracking for onboarding analytics. All changes respect the existing SPAWN_TELEMETRY=0 opt-out mechanism.

Security Analysis

Telemetry Infrastructure

✅ PII scrubbing: All string values in captureEvent() are passed through the same scrubber as errors/warnings (lines 196-205 in telemetry.ts)
✅ Sensitive pattern redaction: API keys, GitHub tokens, emails, IPs, file paths are all redacted before upload (lines 14-58 in telemetry.ts)
✅ Opt-out respected: All new events use captureEvent() which checks _enabled flag (controlled by SPAWN_TELEMETRY=0)
✅ No command args: Only aggregated metrics (elapsed_ms, connect_count, lifetime_hours) are sent - no user input, file paths, or command arguments

Lifecycle Tracking

✅ Safe metadata storage: connect_count and last_connected_at stored in existing SpawnRecord.connection.metadata as strings
✅ No credential leakage: Only spawn_id (random UUID), agent/cloud names, and numeric metrics are sent
✅ Proper event timing: trackSpawnDeleted() called AFTER successful deletion (delete.ts:264, 454), not before (prevents false positives)
✅ Correct reconnect placement: trackSpawnConnected() called BEFORE SSH handoff (list.ts:714) - correct as SSH session never returns

Funnel Tracking

✅ Context isolation: Agent/cloud set via setTelemetryContext() and attached to all events automatically
✅ No session tracking: Only pipeline step completion events - no keystroke tracking or prompt content
✅ Safe timing calculation: Uses module-scoped _funnelStart timestamp - no external state manipulation

Test Coverage Fix

✅ Proper mock isolation: Replaced mock.module with spyOn + mockRestore in lifecycle-telemetry.test.ts to prevent cross-file contamination
✅ Full suite passes: 2068/2068 tests pass after the fix

Tests

✅ bash -n: N/A (no shell scripts modified)
✅ bun test: PASS (2068 tests, 0 failures)
✅ biome lint: PASS (187 files checked, no issues)

Findings

None. Code is secure.

-- security/pr-reviewer

Two bugs from the OpenRouterTeam#3305 rollout: 1. Test pollution: orchestrate.test.ts imports runOrchestration directly and never calls initTelemetry, but _enabled defaulted to true in the module so captureEvent happily fired real events at PostHog tagged agent=testagent. The onboarding funnel filled up with CI fixture data. 2. Funnel started too late: funnel_* events fired inside runOrchestration, which is only called AFTER the interactive picker completes. Users who bail at the agent/cloud/setup-options/name prompts were invisible — yet that's exactly where real drop-off happens. Fix 1 — telemetry.ts: - Default _enabled = false. Nothing fires until initTelemetry is explicitly called. Production (index.ts) calls it; tests that need telemetry (telemetry.test.ts) call it with BUN_ENV/NODE_ENV cleared. - Belt-and-suspenders: initTelemetry now short-circuits when BUN_ENV === "test" || NODE_ENV === "test", so even if future code calls it from a test context, events stay local. Fix 2 — picker instrumentation: New events fired before runOrchestration in every entry path: spawn_launched { mode: interactive | agent_interactive | direct | headless } menu_shown / menu_selected / menu_cancelled (only when user has prior spawns) agent_picker_shown agent_selected { agent } — also sets telemetry context cloud_picker_shown cloud_selected { cloud } — also sets telemetry context preflight_passed setup_options_shown setup_options_selected { step_count } name_prompt_shown name_entered picker_completed Wired into: commands/interactive.ts cmdInteractive + cmdAgentInteractive commands/run.ts cmdRun (direct `spawn <agent> <cloud>`) cmdRunHeadless (only spawn_launched) runOrchestration's existing funnel_* events continue to fire unchanged. The final funnel in PostHog: spawn_launched → agent_selected → cloud_selected → preflight_passed → setup_options_selected → name_entered → picker_completed → funnel_started → funnel_cloud_authed → funnel_credentials_ready → funnel_vm_ready → funnel_install_completed → funnel_configure_completed → funnel_prelaunch_completed → funnel_handoff Tests: - telemetry.test.ts: 2 new env-guard tests (BUN_ENV, NODE_ENV), plus updated beforeEach to clear both env vars so existing tests still exercise initTelemetry. - Full suite: 2131/2131 pass, biome 0 errors. Bumps 1.0.12 -> 1.0.13 (patch — auto-propagates under OpenRouterTeam#3296 policy).

Two bugs from the #3305 rollout: 1. Test pollution: orchestrate.test.ts imports runOrchestration directly and never calls initTelemetry, but _enabled defaulted to true in the module so captureEvent happily fired real events at PostHog tagged agent=testagent. The onboarding funnel filled up with CI fixture data. 2. Funnel started too late: funnel_* events fired inside runOrchestration, which is only called AFTER the interactive picker completes. Users who bail at the agent/cloud/setup-options/name prompts were invisible — yet that's exactly where real drop-off happens. Fix 1 — telemetry.ts: - Default _enabled = false. Nothing fires until initTelemetry is explicitly called. Production (index.ts) calls it; tests that need telemetry (telemetry.test.ts) call it with BUN_ENV/NODE_ENV cleared. - Belt-and-suspenders: initTelemetry now short-circuits when BUN_ENV === "test" || NODE_ENV === "test", so even if future code calls it from a test context, events stay local. Fix 2 — picker instrumentation: New events fired before runOrchestration in every entry path: spawn_launched { mode: interactive | agent_interactive | direct | headless } menu_shown / menu_selected / menu_cancelled (only when user has prior spawns) agent_picker_shown agent_selected { agent } — also sets telemetry context cloud_picker_shown cloud_selected { cloud } — also sets telemetry context preflight_passed setup_options_shown setup_options_selected { step_count } name_prompt_shown name_entered picker_completed Wired into: commands/interactive.ts cmdInteractive + cmdAgentInteractive commands/run.ts cmdRun (direct `spawn <agent> <cloud>`) cmdRunHeadless (only spawn_launched) runOrchestration's existing funnel_* events continue to fire unchanged. The final funnel in PostHog: spawn_launched → agent_selected → cloud_selected → preflight_passed → setup_options_selected → name_entered → picker_completed → funnel_started → funnel_cloud_authed → funnel_credentials_ready → funnel_vm_ready → funnel_install_completed → funnel_configure_completed → funnel_prelaunch_completed → funnel_handoff Tests: - telemetry.test.ts: 2 new env-guard tests (BUN_ENV, NODE_ENV), plus updated beforeEach to clear both env vars so existing tests still exercise initTelemetry. - Full suite: 2131/2131 pass, biome 0 errors. Bumps 1.0.12 -> 1.0.13 (patch — auto-propagates under #3296 policy).

Two bugs from the OpenRouterTeam#3305 rollout: 1. Test pollution: orchestrate.test.ts imports runOrchestration directly and never calls initTelemetry, but _enabled defaulted to true in the module so captureEvent happily fired real events at PostHog tagged agent=testagent. The onboarding funnel filled up with CI fixture data. 2. Funnel started too late: funnel_* events fired inside runOrchestration, which is only called AFTER the interactive picker completes. Users who bail at the agent/cloud/setup-options/name prompts were invisible — yet that's exactly where real drop-off happens. Fix 1 — telemetry.ts: - Default _enabled = false. Nothing fires until initTelemetry is explicitly called. Production (index.ts) calls it; tests that need telemetry (telemetry.test.ts) call it with BUN_ENV/NODE_ENV cleared. - Belt-and-suspenders: initTelemetry now short-circuits when BUN_ENV === "test" || NODE_ENV === "test", so even if future code calls it from a test context, events stay local. Fix 2 — picker instrumentation: New events fired before runOrchestration in every entry path: spawn_launched { mode: interactive | agent_interactive | direct | headless } menu_shown / menu_selected / menu_cancelled (only when user has prior spawns) agent_picker_shown agent_selected { agent } — also sets telemetry context cloud_picker_shown cloud_selected { cloud } — also sets telemetry context preflight_passed setup_options_shown setup_options_selected { step_count } name_prompt_shown name_entered picker_completed Wired into: commands/interactive.ts cmdInteractive + cmdAgentInteractive commands/run.ts cmdRun (direct `spawn <agent> <cloud>`) cmdRunHeadless (only spawn_launched) runOrchestration's existing funnel_* events continue to fire unchanged. The final funnel in PostHog: spawn_launched → agent_selected → cloud_selected → preflight_passed → setup_options_selected → name_entered → picker_completed → funnel_started → funnel_cloud_authed → funnel_credentials_ready → funnel_vm_ready → funnel_install_completed → funnel_configure_completed → funnel_prelaunch_completed → funnel_handoff Tests: - telemetry.test.ts: 2 new env-guard tests (BUN_ENV, NODE_ENV), plus updated beforeEach to clear both env vars so existing tests still exercise initTelemetry. - Full suite: 2131/2131 pass, biome 0 errors. Bumps 1.0.12 -> 1.0.13 (patch — auto-propagates under OpenRouterTeam#3296 policy).

louisgv previously approved these changes Apr 15, 2026

View reviewed changes

louisgv added the security-approved Security review approved label Apr 15, 2026

louisgv and others added 2 commits April 15, 2026 08:39

Merge branch 'main' into feat/funnel-telemetry

452cad6

la14-1 dismissed louisgv’s stale review via 42f4426 April 15, 2026 04:28

louisgv approved these changes Apr 15, 2026

View reviewed changes

louisgv merged commit 1e64d34 into OpenRouterTeam:main Apr 15, 2026
5 checks passed

AhmedTMM mentioned this pull request Apr 15, 2026

fix(telemetry): opt-in default + picker funnel events #3308

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(telemetry): funnel + lifecycle events for onboarding drop-off#3305

feat(telemetry): funnel + lifecycle events for onboarding drop-off#3305
louisgv merged 3 commits intoOpenRouterTeam:mainfrom
AhmedTMM:feat/funnel-telemetry

AhmedTMM commented Apr 14, 2026

Uh oh!

louisgv left a comment

Uh oh!

la14-1 commented Apr 15, 2026

Uh oh!

louisgv left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

AhmedTMM commented Apr 14, 2026

Summary

Funnel events (in `orchestrate.ts`, both fast and sequential paths)

Lifecycle events (new `shared/lifecycle-telemetry.ts`)

Privacy + scrubbing

Persistence model for `connect_count`

Tests

Not doing in this PR

Version

Uh oh!

louisgv left a comment

Choose a reason for hiding this comment

Security Review

Summary

Security Analysis

Telemetry Infrastructure

Lifecycle Tracking

Funnel Tracking

Tests

Findings

Uh oh!

la14-1 commented Apr 15, 2026

Uh oh!

louisgv left a comment

Choose a reason for hiding this comment

Security Review

Summary

Security Analysis

Telemetry Infrastructure

Lifecycle Tracking

Funnel Tracking

Test Coverage Fix

Tests

Findings

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants