feat(telemetry): funnel + lifecycle events for onboarding drop-off#3305
feat(telemetry): funnel + lifecycle events for onboarding drop-off#3305louisgv merged 3 commits intoOpenRouterTeam:mainfrom
Conversation
Adds low-volume, high-signal product events on top of the existing
errors/warnings telemetry (shared/telemetry.ts). Answers "where do users
bail before reaching a running agent" at the fleet level.
Funnel events (in orchestrate.ts, both fast and sequential paths):
funnel_started pipeline begins
funnel_cloud_authed cloud.authenticate() ok
funnel_credentials_ready OR key + preProvision resolved
funnel_vm_ready VM booted and SSH-reachable
funnel_install_completed agent install succeeded (tarball or live)
funnel_configure_completed agent.configure() ran
funnel_prelaunch_completed gateway / dashboard / preLaunch hooks done
funnel_handoff about to launch TUI (final step)
Every event carries elapsed_ms since funnel_started, plus agent and cloud
via telemetry context. Per-step counts reveal the drop-off funnel in
PostHog without touching any PII.
Lifecycle events (new shared/lifecycle-telemetry.ts):
spawn_connected { spawn_id, agent, cloud, connect_count, date }
fired from list.ts when the user reconnects via the interactive picker.
Increments connection.metadata.connect_count and writes last_connected_at
so subsequent events and the eventual spawn_deleted have the total.
spawn_deleted { spawn_id, agent, cloud, lifetime_hours, connect_count, date }
fired from delete.ts (both interactive confirmAndDelete and headless
cmdDelete loop) after a successful cloud destroy. lifetime_hours is
computed from SpawnRecord.timestamp to now. Clamped at 0 for corrupt
clocks. connect_count is read from metadata.
New captureEvent(name, properties) helper in telemetry.ts:
- Respects SPAWN_TELEMETRY=0 opt-out (no new flag)
- Runs every string property through the existing scrubber (API keys,
GitHub tokens, bearer, emails, IPs, base64 blobs, home paths)
- Non-string values pass through untouched
Tests: 20 new (15 lifecycle-telemetry + 2 captureEvent + 3 assertion
additions to disabled-telemetry). Full suite: 2129/2129 pass.
Bumps 1.0.10 -> 1.0.11. Patch bump — auto-propagates under OpenRouterTeam#3296 policy.
louisgv
left a comment
There was a problem hiding this comment.
Security Review
Verdict: APPROVED
Commit: f14e502
Summary
This PR adds funnel telemetry and lifecycle event tracking for onboarding analytics. All changes respect the existing SPAWN_TELEMETRY=0 opt-out mechanism.
Security Analysis
Telemetry Infrastructure
- ✅ PII scrubbing: All string values in
captureEvent()are passed through the same scrubber as errors/warnings (lines 196-205 in telemetry.ts) - ✅ Sensitive pattern redaction: API keys, GitHub tokens, emails, IPs, file paths are all redacted before upload (lines 14-58 in telemetry.ts)
- ✅ Opt-out respected: All new events use
captureEvent()which checks_enabledflag (controlled bySPAWN_TELEMETRY=0) - ✅ No command args: Only aggregated metrics (elapsed_ms, connect_count, lifetime_hours) are sent - no user input, file paths, or command arguments
Lifecycle Tracking
- ✅ Safe metadata storage:
connect_countandlast_connected_atstored in existingSpawnRecord.connection.metadataas strings - ✅ No credential leakage: Only spawn_id (random UUID), agent/cloud names, and numeric metrics are sent
- ✅ Proper event timing:
trackSpawnDeleted()called AFTER successful deletion, not before (prevents false positives)
Funnel Tracking
- ✅ Context isolation: Agent/cloud set via
setTelemetryContext()and attached to all events automatically - ✅ No session tracking: Only pipeline step completion events - no keystroke tracking or prompt content
- ✅ Safe timing calculation: Uses module-scoped
_funnelStarttimestamp - no external state manipulation
Tests
- ✅ bash -n: N/A (no shell scripts modified)
- ✅ bun test: PASS (2068 tests, 0 failures)
- ✅ biome lint: PASS (187 files checked, no issues)
- ✅ Test coverage: New test file
lifecycle-telemetry.test.tswith comprehensive coverage of both tracking functions
Findings
None. Code is secure.
-- security/pr-reviewer
mock.module contaminates the global module registry when running under --coverage, causing telemetry.test.ts and history-cov.test.ts to receive mocked implementations instead of the real modules. Switch to spyOn with mockRestore in afterEach so the real modules are preserved across files. Agent: pr-maintainer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Pushed a fix for the 5 failing Mock Tests: Replaced -- refactor/pr-maintainer |
louisgv
left a comment
There was a problem hiding this comment.
Security Review
Verdict: APPROVED
Commit: 42f4426
Summary
This PR adds funnel telemetry and lifecycle event tracking for onboarding analytics. All changes respect the existing SPAWN_TELEMETRY=0 opt-out mechanism.
Security Analysis
Telemetry Infrastructure
- ✅ PII scrubbing: All string values in
captureEvent()are passed through the same scrubber as errors/warnings (lines 196-205 in telemetry.ts) - ✅ Sensitive pattern redaction: API keys, GitHub tokens, emails, IPs, file paths are all redacted before upload (lines 14-58 in telemetry.ts)
- ✅ Opt-out respected: All new events use
captureEvent()which checks_enabledflag (controlled bySPAWN_TELEMETRY=0) - ✅ No command args: Only aggregated metrics (elapsed_ms, connect_count, lifetime_hours) are sent - no user input, file paths, or command arguments
Lifecycle Tracking
- ✅ Safe metadata storage:
connect_countandlast_connected_atstored in existingSpawnRecord.connection.metadataas strings - ✅ No credential leakage: Only spawn_id (random UUID), agent/cloud names, and numeric metrics are sent
- ✅ Proper event timing:
trackSpawnDeleted()called AFTER successful deletion (delete.ts:264, 454), not before (prevents false positives) - ✅ Correct reconnect placement:
trackSpawnConnected()called BEFORE SSH handoff (list.ts:714) - correct as SSH session never returns
Funnel Tracking
- ✅ Context isolation: Agent/cloud set via
setTelemetryContext()and attached to all events automatically - ✅ No session tracking: Only pipeline step completion events - no keystroke tracking or prompt content
- ✅ Safe timing calculation: Uses module-scoped
_funnelStarttimestamp - no external state manipulation
Test Coverage Fix
- ✅ Proper mock isolation: Replaced
mock.modulewithspyOn+mockRestorein lifecycle-telemetry.test.ts to prevent cross-file contamination - ✅ Full suite passes: 2068/2068 tests pass after the fix
Tests
- ✅ bash -n: N/A (no shell scripts modified)
- ✅ bun test: PASS (2068 tests, 0 failures)
- ✅ biome lint: PASS (187 files checked, no issues)
Findings
None. Code is secure.
-- security/pr-reviewer
Two bugs from the OpenRouterTeam#3305 rollout: 1. Test pollution: orchestrate.test.ts imports runOrchestration directly and never calls initTelemetry, but _enabled defaulted to true in the module so captureEvent happily fired real events at PostHog tagged agent=testagent. The onboarding funnel filled up with CI fixture data. 2. Funnel started too late: funnel_* events fired inside runOrchestration, which is only called AFTER the interactive picker completes. Users who bail at the agent/cloud/setup-options/name prompts were invisible — yet that's exactly where real drop-off happens. Fix 1 — telemetry.ts: - Default _enabled = false. Nothing fires until initTelemetry is explicitly called. Production (index.ts) calls it; tests that need telemetry (telemetry.test.ts) call it with BUN_ENV/NODE_ENV cleared. - Belt-and-suspenders: initTelemetry now short-circuits when BUN_ENV === "test" || NODE_ENV === "test", so even if future code calls it from a test context, events stay local. Fix 2 — picker instrumentation: New events fired before runOrchestration in every entry path: spawn_launched { mode: interactive | agent_interactive | direct | headless } menu_shown / menu_selected / menu_cancelled (only when user has prior spawns) agent_picker_shown agent_selected { agent } — also sets telemetry context cloud_picker_shown cloud_selected { cloud } — also sets telemetry context preflight_passed setup_options_shown setup_options_selected { step_count } name_prompt_shown name_entered picker_completed Wired into: commands/interactive.ts cmdInteractive + cmdAgentInteractive commands/run.ts cmdRun (direct `spawn <agent> <cloud>`) cmdRunHeadless (only spawn_launched) runOrchestration's existing funnel_* events continue to fire unchanged. The final funnel in PostHog: spawn_launched → agent_selected → cloud_selected → preflight_passed → setup_options_selected → name_entered → picker_completed → funnel_started → funnel_cloud_authed → funnel_credentials_ready → funnel_vm_ready → funnel_install_completed → funnel_configure_completed → funnel_prelaunch_completed → funnel_handoff Tests: - telemetry.test.ts: 2 new env-guard tests (BUN_ENV, NODE_ENV), plus updated beforeEach to clear both env vars so existing tests still exercise initTelemetry. - Full suite: 2131/2131 pass, biome 0 errors. Bumps 1.0.12 -> 1.0.13 (patch — auto-propagates under OpenRouterTeam#3296 policy).
Two bugs from the #3305 rollout: 1. Test pollution: orchestrate.test.ts imports runOrchestration directly and never calls initTelemetry, but _enabled defaulted to true in the module so captureEvent happily fired real events at PostHog tagged agent=testagent. The onboarding funnel filled up with CI fixture data. 2. Funnel started too late: funnel_* events fired inside runOrchestration, which is only called AFTER the interactive picker completes. Users who bail at the agent/cloud/setup-options/name prompts were invisible — yet that's exactly where real drop-off happens. Fix 1 — telemetry.ts: - Default _enabled = false. Nothing fires until initTelemetry is explicitly called. Production (index.ts) calls it; tests that need telemetry (telemetry.test.ts) call it with BUN_ENV/NODE_ENV cleared. - Belt-and-suspenders: initTelemetry now short-circuits when BUN_ENV === "test" || NODE_ENV === "test", so even if future code calls it from a test context, events stay local. Fix 2 — picker instrumentation: New events fired before runOrchestration in every entry path: spawn_launched { mode: interactive | agent_interactive | direct | headless } menu_shown / menu_selected / menu_cancelled (only when user has prior spawns) agent_picker_shown agent_selected { agent } — also sets telemetry context cloud_picker_shown cloud_selected { cloud } — also sets telemetry context preflight_passed setup_options_shown setup_options_selected { step_count } name_prompt_shown name_entered picker_completed Wired into: commands/interactive.ts cmdInteractive + cmdAgentInteractive commands/run.ts cmdRun (direct `spawn <agent> <cloud>`) cmdRunHeadless (only spawn_launched) runOrchestration's existing funnel_* events continue to fire unchanged. The final funnel in PostHog: spawn_launched → agent_selected → cloud_selected → preflight_passed → setup_options_selected → name_entered → picker_completed → funnel_started → funnel_cloud_authed → funnel_credentials_ready → funnel_vm_ready → funnel_install_completed → funnel_configure_completed → funnel_prelaunch_completed → funnel_handoff Tests: - telemetry.test.ts: 2 new env-guard tests (BUN_ENV, NODE_ENV), plus updated beforeEach to clear both env vars so existing tests still exercise initTelemetry. - Full suite: 2131/2131 pass, biome 0 errors. Bumps 1.0.12 -> 1.0.13 (patch — auto-propagates under #3296 policy).
Two bugs from the OpenRouterTeam#3305 rollout: 1. Test pollution: orchestrate.test.ts imports runOrchestration directly and never calls initTelemetry, but _enabled defaulted to true in the module so captureEvent happily fired real events at PostHog tagged agent=testagent. The onboarding funnel filled up with CI fixture data. 2. Funnel started too late: funnel_* events fired inside runOrchestration, which is only called AFTER the interactive picker completes. Users who bail at the agent/cloud/setup-options/name prompts were invisible — yet that's exactly where real drop-off happens. Fix 1 — telemetry.ts: - Default _enabled = false. Nothing fires until initTelemetry is explicitly called. Production (index.ts) calls it; tests that need telemetry (telemetry.test.ts) call it with BUN_ENV/NODE_ENV cleared. - Belt-and-suspenders: initTelemetry now short-circuits when BUN_ENV === "test" || NODE_ENV === "test", so even if future code calls it from a test context, events stay local. Fix 2 — picker instrumentation: New events fired before runOrchestration in every entry path: spawn_launched { mode: interactive | agent_interactive | direct | headless } menu_shown / menu_selected / menu_cancelled (only when user has prior spawns) agent_picker_shown agent_selected { agent } — also sets telemetry context cloud_picker_shown cloud_selected { cloud } — also sets telemetry context preflight_passed setup_options_shown setup_options_selected { step_count } name_prompt_shown name_entered picker_completed Wired into: commands/interactive.ts cmdInteractive + cmdAgentInteractive commands/run.ts cmdRun (direct `spawn <agent> <cloud>`) cmdRunHeadless (only spawn_launched) runOrchestration's existing funnel_* events continue to fire unchanged. The final funnel in PostHog: spawn_launched → agent_selected → cloud_selected → preflight_passed → setup_options_selected → name_entered → picker_completed → funnel_started → funnel_cloud_authed → funnel_credentials_ready → funnel_vm_ready → funnel_install_completed → funnel_configure_completed → funnel_prelaunch_completed → funnel_handoff Tests: - telemetry.test.ts: 2 new env-guard tests (BUN_ENV, NODE_ENV), plus updated beforeEach to clear both env vars so existing tests still exercise initTelemetry. - Full suite: 2131/2131 pass, biome 0 errors. Bumps 1.0.12 -> 1.0.13 (patch — auto-propagates under OpenRouterTeam#3296 policy).
Summary
Adds low-volume, high-signal product events on top of the existing errors/warnings telemetry so we can answer "where do users bail before reaching a running agent" at the fleet level, plus track spawn lifetime and login patterns.
Respects existing `SPAWN_TELEMETRY=0` opt-out — no new flags.
Funnel events (in `orchestrate.ts`, both fast and sequential paths)
Every event carries `elapsed_ms` since `funnel_started`, plus `agent` and `cloud` via telemetry context. Per-step counts in PostHog reveal the exact drop-off funnel without PII.
Lifecycle events (new `shared/lifecycle-telemetry.ts`)
`spawn_connected` — fired from `list.ts` when the user reconnects via the interactive picker. Properties: `spawn_id`, `agent`, `cloud`, `connect_count`, `date`. Increments `connection.metadata.connect_count` and writes `last_connected_at` so subsequent events (and the eventual `spawn_deleted`) have the running total.
`spawn_deleted` — fired from `delete.ts` (both interactive `confirmAndDelete` and headless `cmdDelete` loop) after a successful cloud destroy. Properties: `spawn_id`, `agent`, `cloud`, `lifetime_hours`, `connect_count`, `date`. `lifetime_hours` is computed from `SpawnRecord.timestamp` to now and clamped at 0 for corrupt clocks.
Answers: how long does a typical spawn live, how many times do users reconnect to it, which agents/clouds get the most re-use.
Privacy + scrubbing
New `captureEvent(name, properties)` helper in `telemetry.ts`:
Nothing in the funnel events is user-typed — they're all known-at-compile-time agent/cloud names plus timing integers.
Persistence model for `connect_count`
Stored inside `SpawnRecord.connection.metadata` as a stringified integer (the existing metadata schema is `Record<string, string>`). `saveMetadata` merges — no risk of clobbering other keys like `tunnel_remote_port`.
Tests
Not doing in this PR
Version
Bumps 1.0.10 → 1.0.11. Patch bump — auto-propagates under #3296's new policy, so the telemetry will start flowing to users on their next spawn run without any manual update.