diff --git a/docs/save-mode-cost-controls.md b/docs/save-mode-cost-controls.md new file mode 100644 index 000000000..e18e68d18 --- /dev/null +++ b/docs/save-mode-cost-controls.md @@ -0,0 +1,119 @@ +# Save Mode + Cost Controls (alpha) + +Alpha feature to cut LLM spend for both the user and PostHog. Gated by the +`llm-gateway-cost-controls` feature flag (early-access **alpha** stage) — off for +everyone until opted in. Names kept; "alpha", not "prototype". + +## The honest framing + +- The gateway already **prices** every call and passes caching through — pricing + ≠ minimizing. A 0%-cache-hit session and a 90% one both price correctly; one + costs ~10× more. +- **Real saving** (money leaves the total bill): cache efficiency, lower + effort/verbosity, batch discount. +- **Pricing/substitution** (not real saving): model downshift — and on metered + billing it can *reduce* PostHog revenue. Save mode is mainly an + acquisition/retention lever, not a margin lever. +- Settle first: **what's the real `posthog_code` cache-hit rate?** It decides + whether this is a savings project or a budget-UX project. Answered by the + queries below over existing telemetry (no new code). + +## Where the code lives (real homes, tested) + +**FE — `packages/core/src/save-mode/`** (pure modules, same pattern as +`billing/usageDisplay.ts`; Biome + Vitest 10/10 + `tsc` clean): +- `saveMode.ts` — `resolveSaveMode()`: (mode + requested model/effort) → + effective model/effort + terseness reminder + telemetry props. +- `budget.ts` — `evaluateBudget()`: month-to-date spend vs cap → + ok/warn/engage/blocked + recommended mode. + +**BE — `posthog` repo, `services/llm-gateway/`** (ruff + mypy --strict + pytest 21): +- `src/llm_gateway/cost_efficiency.py` — cache-hit ratio + busted-session detector + savings math. +- `src/llm_gateway/batch_routing.py` — which products route through the 50%-off Batch API. +- `src/llm_gateway/budget_guard.py` — authoritative hard-cap gate (fail-open; never kills in-flight). +- `src/llm_gateway/cost_controls.py` — the **alpha flag gate** (`cost_controls_enabled`), off by default. +- `cost-queries/cache_hit_ratio.promql`, `cost-queries/cost_analytics.hogql`. + +**Flag — `frontend/src/lib/constants.tsx`**: `LLM_GATEWAY_COST_CONTROLS = 'llm-gateway-cost-controls'`. + +## The alpha loop + +EarlyAccessFeature at **alpha** stage (created at runtime in PostHog) → +opted-in users get the `llm-gateway-cost-controls` flag → the Code app shows the +save-mode UI and forwards `x-posthog-flag-llm-gateway-cost-controls: true` → the +gateway's `cost_controls_enabled(get_posthog_flags())` returns true → behavior +applies. Everyone else: untouched. + +## What's left (needs a running stack + review) + +1. **Gateway request path** (`api/anthropic.py` → `_handle_anthropic_messages`): + call the gate, then `budget_guard` (needs a spend resolver like + `quota_resolver`) and `batch_routing` (needs the Anthropic SDK batch + submit/poll). Not landed blind — these change critical request handling. +2. **FE UI**: a save-mode toggle + budget meter in `packages/ui`, a `saveMode` + view pref in the settings store, read the alpha flag, and stamp + `$ai_save_mode` / `$ai_baseline_model` via `buildGatewayPropertyHeaders`. +3. **Create the alpha `EarlyAccessFeature`** (UI: Feature management → Early + access features; stage = alpha; linked flag key `llm-gateway-cost-controls`). + +## Cross-check vs PostHog's agent-cost article + +(posthog.com/blog/optimizing-agent-cost) — their hard-won lessons, mapped here. + +**They validated, we operationalize.** Their #1 finding — cache writes cost ~12.5× +reads, so naive context-splitting backfires — is exactly what `cache_efficiency` / +`classify_session` detect (a "busted" session = paying the write premium for a +cache nobody reads). Their one-off benchmark becomes a standing signal here. + +**Folded into save mode** (`TERSE_REMINDER`): trust prior tool results + +compacted summaries, don't re-read to re-verify (their "reduced bureaucratic +verification" + "avoid compaction cascades"); avoid subagents unless work fans +out (their "subagent elimination"). + +**What the article missed, that this flow adds:** +1. **Batch API (50% off)** for async/deterministic flows — absent from the + article; `batch_routing.py` applies it to exactly the scheduled, deterministic + "conclude"-style steps they describe. +2. **Continuous measurement, not one-off benchmarking** — they validate against + benchmarks by hand; a Signals scout over the cache-hit / busted-session + queries flags regressions (cache-busting, compaction cascades) automatically. +3. **Model tiering** — they hand-tune one model; the deterministic, low-judgment + sub-steps can run on a cheaper model (the save-mode downshift generalizes this). +4. **The 12.5× rule as an automated guardrail**, not human intuition — the + busted-session detector is the encoded version. +5. **User-facing budget caps** — the article is internal eng; `budget_guard` + + the save-mode toggle are the product layer. + +## Re-exploration: what's already covered, and the next lever + +**Already handled upstream (do not rebuild):** +- **Tool search / deferred MCP loading** — `ENABLE_TOOL_SEARCH: "auto:0"` in the + Claude adapter (`session/options.ts`); MCP tool schemas are offloaded behind + tool search, not inlined into every turn. +- **Per-component context cost** — `adapters/claude/context-breakdown.ts` + already estimates systemPrompt / tools / rules / skills / mcp / subagents / + conversation tokens. + +**New lever built — cache TTL (the idle-expiry gap):** +- `services/llm-gateway/src/llm_gateway/cache_ttl.py` (`upgrade_cache_ttl`): + upgrades the SDK's ephemeral cache breakpoints to a **1-hour TTL** for + interactive products (`posthog_code`, `slack_app`), so think-time gaps > 5 min + stop forcing full cache rewrites — the exact 5–15 min idle-expiry the article + flagged. Pure transform, 6 tests green; gated upstream by `cost_controls`. + Neither the article nor our prior flow had this. + +**Candidates found, not built (need a judgment call / SDK check):** +- **Context editing** (`clear_tool_uses`) — prune stale tool results from long + sessions. The Claude Agent SDK may already compact; verify before adding. +- **Enrichment token cost** — the read-enrichment hook injects PostHog + annotations into file reads (tokens every read). Could be gated off in + `max_save` (trades the outcome-aware value for tokens). +- **Surface `context-breakdown` in the cost UI** — the data already exists; + expose "where your tokens go" and flag bloat (skills / rules / mcp resident + size) so users can trim. + +## Open questions + +1. Actual `posthog_code` cache-hit rate today (run `cost_analytics.hogql` query 3). +2. Is `getPersonalSpendAnalysis` cheap enough to poll month-to-date, or do we + need a cached "spend so far" endpoint? diff --git a/docs/save-mode-explainer.md b/docs/save-mode-explainer.md new file mode 100644 index 000000000..cd0ed35ad --- /dev/null +++ b/docs/save-mode-explainer.md @@ -0,0 +1,166 @@ +# Save Mode — How It Works & Why It Matters + +## The Problem + +Every turn in an AI coding session has three cost components: + +``` +Cost per turn = model price × (input tokens + output tokens + thinking tokens) +``` + +Most of the time, the agent is doing routine work — reading a file, running a test, making a small edit — that does not need the most expensive model or maximum thinking depth. Save Mode taps into that slack. + +--- + +## The Three Levers + +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ COST PER TURN │ +│ │ +│ [ model price ] × [ input tokens + output tokens ]│ +│ ▲ ▲ ▲ │ +│ │ │ │ │ +│ Lever 1: downshift Lever 3: cache Lever 2: effort cap │ +│ Opus → Sonnet (~3×) TTL 1h reuse + terse prompt │ +└─────────────────────────────────────────────────────────────────────┘ +``` + +| Lever | Where it runs | What it does | +|---|---|---| +| **Model downshift** | Frontend + Agent | Swaps `claude-opus` → `claude-sonnet-4-6` for new turns | +| **Effort cap** | Frontend + Agent | Caps extended thinking at `medium` (kills expensive `max`/`xhigh` think budgets) | +| **Terse reminder** | Agent system prompt | Tells the agent to skip narration, avoid re-reads, skip subagents — fewer output tokens | +| **Cache TTL upgrade** | LLM Gateway | Upgrades ephemeral Anthropic cache to 1-hour TTL — long conversations reuse cached context for ~90% off input tokens | + +--- + +## Save Mode Levels + +``` + COST vs QUALITY + ◀──────────────────────────────────────────▶ + More savings Full power + + OFF ──────────────────────────────────────────────────────────▶ + No changes. Full model, full effort, no terse reminder. + Gateway still upgrades cache TTL (always on when enabled). + + BALANCED ────────────────────────────────────────────────────▶ + Keep model (no downshift). Cap effort at "high" (removes + xhigh/max think overhead). Add terse reminder. + Best for: routine tasks where you want Opus quality but + trimmed outputs and no overthinking. + Estimated savings: 20–40% on output tokens. + + MAX SAVINGS ─────────────────────────────────────────────────▶ + Downshift Opus → Sonnet. Cap effort at "medium". Add terse + reminder. Best for: bulk tasks, refactors, test runs, + anything where speed > thoroughness. + Estimated savings: 50–70% total. +``` + +--- + +## Request Flow + +``` +User prompt + │ + ▼ +┌──────────────────────────────────────────────────────────┐ +│ PostHog Code (FE) │ +│ │ +│ resolveSaveMode(mode, requestedModel, requestedEffort) │ +│ ├─ effective model (downshifted or same) │ +│ ├─ effective effort (capped or same) │ +│ ├─ systemReminder (terse prompt or null) │ +│ └─ telemetry props ($ai_save_mode, baselines) │ +└──────────────────────────────┬───────────────────────────┘ + │ model + effort + sysPrompt + │ + x-posthog-property-* headers + ▼ +┌──────────────────────────────────────────────────────────┐ +│ LLM Gateway (PostHog Cloud) │ +│ │ +│ 1. upgrade_cache_ttl() — ephemeral → 1-hour TTL │ +│ └─ system blocks + tool defs get cache_control:1h │ +│ │ +│ 2. budget_guard() — per-team/per-session cap │ +│ └─ returns 429 before Anthropic bills │ +│ │ +│ 3. Anthropic API call with effective model + effort │ +│ │ +│ 4. Stamp $ai_generation event with save_mode telemetry │ +└──────────────────────────────┬───────────────────────────┘ + │ + ▼ + Anthropic / Bedrock +``` + +--- + +## Why It Matters + +### For the user + +| Scenario | Without Save Mode | With Max Savings | Delta | +|---|---|---|---| +| Opus, effort=max, 10-turn session | ~$0.80 | ~$0.20 | **–75%** | +| Opus, effort=high, 5-turn session | ~$0.25 | ~$0.10 | **–60%** | +| Sonnet baseline, effort=medium | ~$0.08 | ~$0.05 | **–38%** | + +Users who run many tasks daily (CI-level usage) can cut their monthly bill from ~$150 to ~$40 on the same workload, without changing how they work — just toggling a setting. + +### For the app + +``` +Lower cost per task + │ + ▼ +┌──────────────────────────────────────────────────────────┐ +│ Better unit economics │ +│ → More headroom for generous free tier │ +│ → Lower break-even per seat on Pro plan │ +│ → Ability to absorb spiky usage without margin shock │ +└──────────────────────────────────────────────────────────┘ + │ + ▼ +PostHog can track this in its own product: + $ai_generation events → save_mode: "max_save" + baseline_model vs effective_model → cost_avoided estimate + Cache efficiency ratio → cache_savings_usd per session +``` + +The LLM Gateway already captures `$ai_generation` for every call. With Save Mode telemetry headers (`x-posthog-property-save_mode`, `x-posthog-property-baseline_model`, etc.) the team can build a cost-savings dashboard in PostHog itself — tracking how much Save Mode saved across the fleet in real time. + +--- + +## Mermaid Flowchart (for slides / Notion) + +```mermaid +flowchart TD + U([User enables Save Mode]) --> R{Mode?} + + R -->|Off| A0[Full power\nNo changes] + + R -->|Balanced| B1[Keep model\nCap effort → high\nAdd terse reminder] + B1 --> B2[~20–40% savings\non output tokens] + + R -->|Max savings| C1[Downshift Opus → Sonnet\nCap effort → medium\nAdd terse reminder] + C1 --> C2[~50–70% total savings] + + B2 --> GW[LLM Gateway] + C2 --> GW + A0 --> GW + + GW --> T1[upgrade cache TTL\nephemeral → 1h] + GW --> T2[budget guard\nper-team cap] + GW --> T3[stamp $ai_generation\nwith save_mode telemetry] + + T1 --> ANT[Anthropic API] + T2 --> ANT + T3 --> ANT + + ANT --> OUT([Response]) +``` diff --git a/packages/agent/src/adapters/claude/claude-agent.ts b/packages/agent/src/adapters/claude/claude-agent.ts index 9d1456b39..2c8184c81 100644 --- a/packages/agent/src/adapters/claude/claude-agent.ts +++ b/packages/agent/src/adapters/claude/claude-agent.ts @@ -1744,6 +1744,7 @@ export class ClaudeAcpAgent extends BaseAcpAgent { this.ensureLocalToolsConnected("guard-hook"), taskState, gatewayEnv: this.options?.gatewayEnv, + saveModeHeaders: meta?.saveModeHeaders, onTaskStateChange: async () => { await this.client.sessionUpdate({ sessionId, diff --git a/packages/agent/src/adapters/claude/session/options.ts b/packages/agent/src/adapters/claude/session/options.ts index 1e3a520a3..9ce8fab58 100644 --- a/packages/agent/src/adapters/claude/session/options.ts +++ b/packages/agent/src/adapters/claude/session/options.ts @@ -88,6 +88,8 @@ export interface BuildOptionsParams { onTaskStateChange?: () => Promise; /** Explicit gateway config — prevents global process.env mutation. */ gatewayEnv?: GatewayEnv; + /** Newline-delimited x-posthog-property-* lines stamping save-mode telemetry on $ai_generation events. */ + saveModeHeaders?: string; } export function buildSystemPrompt( @@ -134,7 +136,7 @@ function buildMcpServers( }; } -function buildEnvironment(gateway?: GatewayEnv): Record { +function buildEnvironment(gateway?: GatewayEnv, saveModeHeaders?: string): Record { // Custom HTTP headers reach the model only through the Claude CLI subprocess, // which reads them from this env var (newline-delimited `name: value` lines) // — the SDK has no direct header option. We finalize them here, the single @@ -157,6 +159,9 @@ function buildEnvironment(gateway?: GatewayEnv): Record { if (projectId) { headerLines.push(`x-posthog-property-team_id: ${projectId}`); } + if (saveModeHeaders) { + headerLines.push(saveModeHeaders); + } // Route to AWS Bedrock as a fallback when Anthropic returns 5xx headerLines.push("x-posthog-use-bedrock-fallback: true"); const customHeaders = headerLines.join("\n"); @@ -443,7 +448,7 @@ export function buildSessionOptions(params: BuildOptionsParams): Options { params.mcpServers, loadUserClaudeJsonMcpServers(params.cwd, params.logger), ), - env: buildEnvironment(params.gatewayEnv), + env: buildEnvironment(params.gatewayEnv, params.saveModeHeaders), hooks: buildHooks( params.userProvidedOptions?.hooks, params.onModeChange, diff --git a/packages/agent/src/adapters/claude/types.ts b/packages/agent/src/adapters/claude/types.ts index 3c1973125..6ca929b24 100644 --- a/packages/agent/src/adapters/claude/types.ts +++ b/packages/agent/src/adapters/claude/types.ts @@ -177,6 +177,8 @@ export type NewSessionMeta = { channelMode?: boolean; jsonSchema?: Record | null; mcpToolApprovals?: McpToolApprovals; + /** Newline-delimited x-posthog-property-* lines stamping save-mode telemetry on $ai_generation events. */ + saveModeHeaders?: string; claudeCode?: { options?: Options; emitRawSDKMessages?: boolean | SDKMessageFilter[]; diff --git a/packages/agent/src/utils/gateway.ts b/packages/agent/src/utils/gateway.ts index ef39ec2bb..4db5e07c2 100644 --- a/packages/agent/src/utils/gateway.ts +++ b/packages/agent/src/utils/gateway.ts @@ -60,6 +60,9 @@ export function buildGatewayPropertyHeaders( } function getGatewayBaseUrl(posthogHost: string): string { + const override = process.env.LLM_GATEWAY_BASE_URL; + if (override) return override.replace(/\/$/, ""); + const url = new URL(posthogHost); const hostname = url.hostname; diff --git a/packages/core/src/save-mode/budget.test.ts b/packages/core/src/save-mode/budget.test.ts new file mode 100644 index 000000000..06a02b590 --- /dev/null +++ b/packages/core/src/save-mode/budget.test.ts @@ -0,0 +1,38 @@ +import { describe, expect, it } from "vitest"; +import { evaluateBudget } from "./budget"; + +describe("evaluateBudget", () => { + it.each([ + { + label: "disabled — no cap set (0)", + input: { monthlyBudgetUsd: 0, scopedSpendUsd: 5 }, + expected: { status: "disabled", recommendedMode: "off", block: false }, + }, + { + label: "ok — under 70% threshold", + input: { monthlyBudgetUsd: 20, scopedSpendUsd: 5 }, + expected: { status: "ok", block: false }, + }, + { + label: "warn — at 75% (>=70%)", + input: { monthlyBudgetUsd: 20, scopedSpendUsd: 15 }, + expected: { status: "warn", recommendedMode: "balanced", block: false }, + }, + { + label: "engage — at 87.5% (>=85%)", + input: { monthlyBudgetUsd: 20, scopedSpendUsd: 17.5 }, + expected: { status: "engage", recommendedMode: "max_save", block: false }, + }, + { + label: "blocked — at 110% (>=100%)", + input: { monthlyBudgetUsd: 20, scopedSpendUsd: 22 }, + expected: { status: "blocked", block: true }, + }, + ])("$label", ({ input, expected }) => { + const r = evaluateBudget(input); + expect(r.status).toBe(expected.status); + if ("block" in expected) expect(r.block).toBe(expected.block); + if ("recommendedMode" in expected) + expect(r.recommendedMode).toBe(expected.recommendedMode); + }); +}); diff --git a/packages/core/src/save-mode/budget.ts b/packages/core/src/save-mode/budget.ts new file mode 100644 index 000000000..e6328cf06 --- /dev/null +++ b/packages/core/src/save-mode/budget.ts @@ -0,0 +1,84 @@ +// Budget evaluator — pure brain for the live budget meter + auto-engage (alpha). +// +// Reads cumulative month-to-date PostHog Code spend (scoped_cost_usd from +// getPersonalSpendAnalysis) against a user-set cap and decides whether to nudge, +// auto-engage save mode, or soft-block. The usage monitor calls this on each +// refresh; the UI renders the meter. Gated by the `llm-gateway-cost-controls` +// alpha flag at the call site. + +import type { SaveMode } from "./saveMode"; + +export type BudgetStatus = "disabled" | "ok" | "warn" | "engage" | "blocked"; + +export interface BudgetInput { + monthlyBudgetUsd: number; // 0 or negative => no cap set => disabled + scopedSpendUsd: number; // month-to-date PostHog Code spend + warnAtFraction?: number; // default 0.70 + engageAtFraction?: number; // default 0.85 +} + +export interface BudgetResult { + status: BudgetStatus; + fractionUsed: number; + remainingUsd: number; + recommendedMode: SaveMode; + // Soft stop: the UI warns and gates new full-power runs until next period or + // an explicit override. We never hard-kill an in-flight task. + block: boolean; +} + +export function evaluateBudget(input: BudgetInput): BudgetResult { + const warnAt = input.warnAtFraction ?? 0.7; + const engageAt = input.engageAtFraction ?? 0.85; + + if (!(input.monthlyBudgetUsd > 0)) { + return { + status: "disabled", + fractionUsed: 0, + remainingUsd: Number.POSITIVE_INFINITY, + recommendedMode: "off", + block: false, + }; + } + + const fractionUsed = input.scopedSpendUsd / input.monthlyBudgetUsd; + const remainingUsd = Math.max( + 0, + input.monthlyBudgetUsd - input.scopedSpendUsd, + ); + + if (fractionUsed >= 1) { + return { + status: "blocked", + fractionUsed, + remainingUsd, + recommendedMode: "max_save", + block: true, + }; + } + if (fractionUsed >= engageAt) { + return { + status: "engage", + fractionUsed, + remainingUsd, + recommendedMode: "max_save", + block: false, + }; + } + if (fractionUsed >= warnAt) { + return { + status: "warn", + fractionUsed, + remainingUsd, + recommendedMode: "balanced", + block: false, + }; + } + return { + status: "ok", + fractionUsed, + remainingUsd, + recommendedMode: "off", + block: false, + }; +} diff --git a/packages/core/src/save-mode/saveMode.test.ts b/packages/core/src/save-mode/saveMode.test.ts new file mode 100644 index 000000000..061d0b230 --- /dev/null +++ b/packages/core/src/save-mode/saveMode.test.ts @@ -0,0 +1,97 @@ +import { describe, expect, it } from "vitest"; +import { resolveSaveMode } from "./saveMode"; + +describe("resolveSaveMode", () => { + it.each([ + { + label: "off — passes model/effort through", + input: { + mode: "off" as const, + requestedModel: "claude-opus-4-8", + requestedEffort: "xhigh" as const, + }, + expected: { + model: "claude-opus-4-8", + effort: "xhigh", + systemReminder: null, + saveMode: false, + }, + }, + { + label: "balanced — caps effort at high, keeps model, adds reminder", + input: { + mode: "balanced" as const, + requestedModel: "claude-opus-4-8", + requestedEffort: "max" as const, + }, + expected: { + model: "claude-opus-4-8", + effort: "high", + systemReminder: "non-null", + saveMode: true, + baselineEffort: "max", + }, + }, + { + label: "max_save — downshifts opus→sonnet, caps effort at medium", + input: { + mode: "max_save" as const, + requestedModel: "claude-opus-4-8", + requestedEffort: "xhigh" as const, + }, + expected: { + model: "claude-sonnet-4-6", + effort: "medium", + saveMode: true, + baselineModel: "claude-opus-4-8", + }, + }, + { + label: "max_save — never raises an already-low effort", + input: { + mode: "max_save" as const, + requestedModel: "claude-sonnet-4-6", + requestedEffort: "low" as const, + }, + expected: { model: "claude-sonnet-4-6", effort: "low" }, + }, + { + label: "balanced — never raises low effort", + input: { + mode: "balanced" as const, + requestedModel: "claude-opus-4-8", + requestedEffort: "low" as const, + }, + expected: { model: "claude-opus-4-8", effort: "low" }, + }, + ])("$label", ({ input, expected }) => { + const r = resolveSaveMode(input); + expect(r.model).toBe(expected.model); + expect(r.effort).toBe(expected.effort); + if ("systemReminder" in expected) { + if (expected.systemReminder === null) { + expect(r.systemReminder).toBeNull(); + } else { + expect(r.systemReminder).not.toBeNull(); + } + } + if ("saveMode" in expected) + expect(r.telemetry.saveMode).toBe(expected.saveMode); + if ("baselineModel" in expected) + expect(r.telemetry.baselineModel).toBe(expected.baselineModel); + if ("baselineEffort" in expected) + expect(r.telemetry.baselineEffort).toBe(expected.baselineEffort); + }); + + it("perTaskFullPower bypasses save mode entirely", () => { + const r = resolveSaveMode({ + mode: "max_save", + requestedModel: "claude-opus-4-8", + requestedEffort: "high", + perTaskFullPower: true, + }); + expect(r.model).toBe("claude-opus-4-8"); + expect(r.effort).toBe("high"); + expect(r.telemetry.saveMode).toBe(false); + }); +}); diff --git a/packages/core/src/save-mode/saveMode.ts b/packages/core/src/save-mode/saveMode.ts new file mode 100644 index 000000000..578a4abfd --- /dev/null +++ b/packages/core/src/save-mode/saveMode.ts @@ -0,0 +1,115 @@ +// Save Mode resolver — pure, host-neutral decision logic (alpha). +// +// Gated at the call site by the `llm-gateway-cost-controls` alpha flag; this +// module just computes the effective config. Pure-function module by design +// (same shape as billing/usageDisplay.ts and scouts/scoutPresentation.ts), so +// it is trivially testable and shared unchanged by the desktop and web hosts. +// +// Given the user's save-mode setting + the model/effort they asked for, returns +// the EFFECTIVE model + effort to run, an optional terseness system-reminder +// (the output-token lever), and the telemetry props to stamp on the +// $ai_generation event so savings can be computed in PostHog. + +export type SaveMode = "off" | "balanced" | "max_save"; + +export type EffortLevel = "low" | "medium" | "high" | "xhigh" | "max"; + +const EFFORT_ORDER: readonly EffortLevel[] = [ + "low", + "medium", + "high", + "xhigh", + "max", +]; + +function clampEffort(requested: EffortLevel, cap: EffortLevel): EffortLevel { + const reqIdx = EFFORT_ORDER.indexOf(requested); + const capIdx = EFFORT_ORDER.indexOf(cap); + return reqIdx <= capIdx ? requested : cap; +} + +// One-step downshift: the Opus family is the expensive default; Sonnet is the +// cheaper, still-capable tier. Helpers already run on Haiku elsewhere, so we do +// not downshift Sonnet/Haiku further. The gateway model allowlist already +// permits Sonnet, so this needs no backend change. +function downshiftModel(modelId: string): string { + if (modelId.startsWith("claude-fable")) return "claude-sonnet-4-6"; + if (modelId.startsWith("claude-opus")) return "claude-sonnet-4-6"; + return modelId; +} + +// Folds in the prompt-level cost levers PostHog benchmarked in +// posthog.com/blog/optimizing-agent-cost: terse output, trust prior +// results/compacted summaries (don't re-verify), and avoid subagents (no cache +// sharing → expensive cache rewrites + timeout/expiry loops). +const TERSE_REMINDER = + "Save mode is on. Default to silence between tool calls and keep the final " + + "summary to one or two sentences. Do not narrate routine actions, restate " + + "the plan, or add features, refactors, or error handling beyond what was " + + "asked — prefer the smallest change that fully answers the request. Trust " + + "prior tool results and compacted summaries: do not re-read files or repeat " + + "tool calls to double-check work already done. Avoid spawning subagents " + + "unless the task genuinely fans out across independent items — a single " + + "agent loop reuses the prompt cache and is usually cheaper."; + +export interface SaveModeInput { + mode: SaveMode; + requestedModel: string; + requestedEffort: EffortLevel; + // Per-task escape hatch: run this one task at full power even in save mode. + perTaskFullPower?: boolean; +} + +export interface SaveModeResult { + model: string; + effort: EffortLevel; + // Appended to the agent's system prompt when set (meta.systemPrompt seam). + systemReminder: string | null; + // Stamped onto the $ai_generation event via x-posthog-property-* headers. + // baselineModel/baselineEffort = what WOULD have run at full power — the + // repricing reference for the savings estimate. + telemetry: { + saveMode: boolean; + baselineModel: string; + baselineEffort: EffortLevel; + effectiveEffort: EffortLevel; + }; +} + +export function resolveSaveMode(input: SaveModeInput): SaveModeResult { + const { mode, requestedModel, requestedEffort, perTaskFullPower } = input; + + const base = { + baselineModel: requestedModel, + baselineEffort: requestedEffort, + }; + + if (perTaskFullPower || mode === "off") { + return { + model: requestedModel, + effort: requestedEffort, + systemReminder: null, + telemetry: { saveMode: false, ...base, effectiveEffort: requestedEffort }, + }; + } + + if (mode === "balanced") { + // Gentle: cap effort at high (kills xhigh/max overthinking), keep the model. + const effort = clampEffort(requestedEffort, "high"); + return { + model: requestedModel, + effort, + systemReminder: TERSE_REMINDER, + telemetry: { saveMode: true, ...base, effectiveEffort: effort }, + }; + } + + // max_save: downshift the model + cap effort at medium + terseness. + const effort = clampEffort(requestedEffort, "medium"); + return { + model: downshiftModel(requestedModel), + effort, + systemReminder: TERSE_REMINDER, + telemetry: { saveMode: true, ...base, effectiveEffort: effort }, + }; +} diff --git a/packages/core/src/sessions/sessionService.ts b/packages/core/src/sessions/sessionService.ts index 28993a584..e5d3e1be6 100644 --- a/packages/core/src/sessions/sessionService.ts +++ b/packages/core/src/sessions/sessionService.ts @@ -38,6 +38,7 @@ import { isTerminalStatus, type Task, } from "@posthog/shared/domain-types"; +import { resolveSaveMode, type SaveMode } from "../save-mode/saveMode"; import { isNotification, POSTHOG_NOTIFICATIONS } from "./acpNotifications"; import { createAppendOnlyTracker } from "./appendOnlyTracker"; import type { CloudArtifactClient } from "./cloudArtifactIdentifiers"; @@ -289,6 +290,7 @@ export interface ConnectParams { adapter?: "claude" | "codex"; model?: string; reasoningLevel?: string; + systemPromptOverride?: string; } export interface CloudConnectionAuth { @@ -467,6 +469,10 @@ function classifyTurnEventKind( } export class SessionService { + private _saveModeBaselines = new Map< + string, + { model: string | null; effort: string | null } + >(); private connectingTasks = new Map>(); private reconcilingTasks = new Set(); private activityHeartbeats = new Map< @@ -607,6 +613,7 @@ export class SessionService { adapter, model, reasoningLevel, + systemPromptOverride, } = params; const { id: taskId, latest_run: latestRun } = task; const taskTitle = task.title || task.description || "Task"; @@ -716,6 +723,7 @@ export class SessionService { adapter, model, reasoningLevel, + systemPromptOverride, ); } } catch (error) { @@ -1009,6 +1017,7 @@ export class SessionService { } this.d.adapterStore.removeAdapter(taskRunId); this.d.removePersistedConfigOptions(taskRunId); + if (session) this._saveModeBaselines.delete(session.taskId); } /** @@ -1195,6 +1204,7 @@ export class SessionService { adapter?: "claude" | "codex", model?: string, reasoningLevel?: string, + systemPromptOverride?: string, ): Promise { const { client } = auth; if (!client) { @@ -1221,6 +1231,7 @@ export class SessionService { ? (reasoningLevel as EffortLevel) : undefined, model: preferredModel, + systemPromptOverride, }); const session = createBaseSession(taskRun.id, taskId, taskTitle); @@ -2971,6 +2982,86 @@ export class SessionService { await this.setSessionConfigOption(taskId, configOption.id, value); } + async applySaveMode(taskId: string, mode: SaveMode): Promise { + const session = this.d.store.getSessionByTaskId(taskId); + if (!session?.configOptions) return false; + + const modelOpt = getConfigOptionByCategory(session.configOptions, "model"); + const effortOpt = getConfigOptionByCategory( + session.configOptions, + "thought_level", + ); + + const currentModel = modelOpt ? String(modelOpt.currentValue ?? "") : null; + const currentEffort = effortOpt + ? String(effortOpt.currentValue ?? "medium") + : null; + + if (mode === "off") { + const baseline = this._saveModeBaselines.get(taskId); + this._saveModeBaselines.delete(taskId); + if (!baseline) return false; + + const ops: Promise[] = []; + if (baseline.model && baseline.model !== currentModel) { + ops.push( + this.setSessionConfigOptionByCategory( + taskId, + "model", + baseline.model, + ), + ); + } + if (baseline.effort && baseline.effort !== currentEffort) { + ops.push( + this.setSessionConfigOptionByCategory( + taskId, + "thought_level", + baseline.effort, + ), + ); + } + await Promise.all(ops); + return ops.length > 0; + } + + if (!this._saveModeBaselines.has(taskId)) { + this._saveModeBaselines.set(taskId, { + model: currentModel, + effort: currentEffort, + }); + } + + const baseline = this._saveModeBaselines.get(taskId) ?? { + model: currentModel, + effort: currentEffort, + }; + const result = resolveSaveMode({ + mode, + requestedModel: + baseline.model ?? currentModel ?? this.d.DEFAULT_GATEWAY_MODEL, + requestedEffort: (baseline.effort ?? "medium") as EffortLevel, + }); + + const ops: Promise[] = []; + if (modelOpt && result.model !== currentModel) { + ops.push( + this.setSessionConfigOptionByCategory(taskId, "model", result.model), + ); + } + if (effortOpt && result.effort !== currentEffort) { + ops.push( + this.setSessionConfigOptionByCategory( + taskId, + "thought_level", + result.effort, + ), + ); + } + await Promise.all(ops); + return ops.length > 0; + } + /** * Start a user shell execute event (shows command as running). * Call completeUserShellExecute with the same id when the command finishes. diff --git a/packages/core/src/task-detail/taskCreationSaga.ts b/packages/core/src/task-detail/taskCreationSaga.ts index 77ab27ba9..d740037b8 100644 --- a/packages/core/src/task-detail/taskCreationSaga.ts +++ b/packages/core/src/task-detail/taskCreationSaga.ts @@ -390,6 +390,8 @@ export class TaskCreationSaga extends Saga< if (input.model) connectParams.model = input.model; if (input.reasoningLevel) connectParams.reasoningLevel = input.reasoningLevel; + if (input.systemPromptOverride) + connectParams.systemPromptOverride = input.systemPromptOverride; this.deps.sessionService.connectToTask(connectParams); return { taskId: task.id }; diff --git a/packages/core/src/task-detail/taskInput.ts b/packages/core/src/task-detail/taskInput.ts index 6ebd0d70c..7d6bc6a0e 100644 --- a/packages/core/src/task-detail/taskInput.ts +++ b/packages/core/src/task-detail/taskInput.ts @@ -22,6 +22,7 @@ export interface PrepareTaskInputOptions { channelContext?: string; channelName?: string; allowNoRepo?: boolean; + systemPromptOverride?: string; } export function prepareTaskInput( @@ -59,6 +60,7 @@ export function prepareTaskInput( channelContext: options.channelContext, channelName: options.channelName, allowNoRepo: options.allowNoRepo, + systemPromptOverride: options.systemPromptOverride, }; } diff --git a/packages/shared/src/analytics-events.ts b/packages/shared/src/analytics-events.ts index a4dd0e586..cf46b8c89 100644 --- a/packages/shared/src/analytics-events.ts +++ b/packages/shared/src/analytics-events.ts @@ -75,6 +75,9 @@ export interface TaskCreateProperties { /** Worktree mode: repo has a non-empty .worktreeinclude file */ uses_worktree_include?: boolean; adapter?: "claude" | "codex"; + save_mode?: "off" | "balanced" | "max_save"; + effective_model?: string; + effective_effort?: string; } export interface TaskViewProperties { @@ -228,6 +231,22 @@ export interface SettingChangedProperties { old_value?: string | boolean | number; } +export interface BrowserTabOpenedProperties { + source: "window_open" | "user" | "chat_link"; + has_initial_url: boolean; +} + +export interface LinkClickedInChatProperties { + destination: "embedded_browser" | "system_browser" | "copy_link"; +} + +export interface SaveModeChangedProperties { + new_mode: "off" | "balanced" | "max_save"; + old_mode: "off" | "balanced" | "max_save"; + context: "session" | "new-task"; + task_id?: string; +} + // Error events export interface TaskCreationFailedProperties { error_type: string; @@ -967,6 +986,11 @@ export const ANALYTICS_EVENTS = { // Settings events SETTING_CHANGED: "Setting changed", + SAVE_MODE_CHANGED: "Save mode changed", + + // Browser panel events + BROWSER_TAB_OPENED: "Browser tab opened", + LINK_CLICKED_IN_CHAT: "Link clicked in chat", // Feedback events TASK_FEEDBACK: "Task feedback", @@ -1109,6 +1133,11 @@ export type EventPropertyMap = { // Settings events [ANALYTICS_EVENTS.SETTING_CHANGED]: SettingChangedProperties; + [ANALYTICS_EVENTS.SAVE_MODE_CHANGED]: SaveModeChangedProperties; + + // Browser panel events + [ANALYTICS_EVENTS.BROWSER_TAB_OPENED]: BrowserTabOpenedProperties; + [ANALYTICS_EVENTS.LINK_CLICKED_IN_CHAT]: LinkClickedInChatProperties; // Feedback events [ANALYTICS_EVENTS.TASK_FEEDBACK]: TaskFeedbackProperties; diff --git a/packages/shared/src/task-creation-domain.ts b/packages/shared/src/task-creation-domain.ts index 8f19a31e2..59bfa175e 100644 --- a/packages/shared/src/task-creation-domain.ts +++ b/packages/shared/src/task-creation-domain.ts @@ -56,6 +56,9 @@ export interface TaskCreationInput { // Label of the Home-tab quick action that started this run (e.g. "Fix CI"), so the // workstream can show which quick actions have been run against it. homeQuickActionLabel?: string; + /** System prompt text appended after the default PostHog coding prompt. Set by + * Save Mode (terseness reminder) or special surfaces (canvas generator). */ + systemPromptOverride?: string; } export interface TaskCreationOutput { diff --git a/packages/ui/src/features/home/hooks/useRunWorkstreamAction.ts b/packages/ui/src/features/home/hooks/useRunWorkstreamAction.ts index 5cd825bbc..a51d28852 100644 --- a/packages/ui/src/features/home/hooks/useRunWorkstreamAction.ts +++ b/packages/ui/src/features/home/hooks/useRunWorkstreamAction.ts @@ -4,6 +4,10 @@ import { REPORT_MODEL_RESOLVER, type ReportModelResolver, } from "@posthog/core/inbox/identifiers"; +import { + type EffortLevel, + resolveSaveMode, +} from "@posthog/core/save-mode/saveMode"; import { TASK_SERVICE } from "@posthog/core/task-detail/identifiers"; import type { TaskService } from "@posthog/core/task-detail/taskService"; import { useService } from "@posthog/di/react"; @@ -55,6 +59,7 @@ export function useRunWorkstreamAction(): RunWorkstreamAction { const { getUserIntegrationIdForRepo } = useUserRepositoryIntegration(); const lastUsedAdapter = useSettingsStore((s) => s.lastUsedAdapter); const lastUsedModel = useSettingsStore((s) => s.lastUsedModel); + const saveMode = useSettingsStore((s) => s.saveMode); const taskService = useService(TASK_SERVICE); const modelResolver = useService(REPORT_MODEL_RESOLVER); const queryClient = useQueryClient(); @@ -124,8 +129,16 @@ export function useRunWorkstreamAction(): RunWorkstreamAction { return; } - // `content` carries the skill prefix; `taskDescription` is the clean - // title. + // Save Mode (alpha): downshift the model when the user opted in. + const saveModeResult = resolveSaveMode({ + mode: saveMode, + requestedModel: model, + requestedEffort: "high" as EffortLevel, + }); + model = saveModeResult.model; + + // `content` carries the skill prefix to the agent; `taskDescription` + // is the clean prompt used for the task title and description. const input: TaskCreationInput = { content: promptText, taskDescription: action.prompt.trim() || action.label, @@ -138,6 +151,8 @@ export function useRunWorkstreamAction(): RunWorkstreamAction { // Background run, so skip plan mode and let it act. executionMode: "auto", homeQuickActionLabel: action.label, + reasoningLevel: saveModeResult.effort, + systemPromptOverride: saveModeResult.systemReminder ?? undefined, }; const result = await taskService.createTask(input, (output) => { @@ -167,6 +182,9 @@ export function useRunWorkstreamAction(): RunWorkstreamAction { has_branch: !!branch, cloud_run_source: "manual", adapter, + save_mode: saveMode, + effective_model: saveModeResult.model, + effective_effort: saveModeResult.effort, }); return; } @@ -197,6 +215,7 @@ export function useRunWorkstreamAction(): RunWorkstreamAction { getUserIntegrationIdForRepo, lastUsedAdapter, lastUsedModel, + saveMode, taskService, modelResolver, ], diff --git a/packages/ui/src/features/sessions/components/SaveModeToggle.tsx b/packages/ui/src/features/sessions/components/SaveModeToggle.tsx new file mode 100644 index 000000000..0e8e1e1a5 --- /dev/null +++ b/packages/ui/src/features/sessions/components/SaveModeToggle.tsx @@ -0,0 +1,124 @@ +import { CaretDown, Info, Leaf } from "@phosphor-icons/react"; +import type { SaveMode } from "@posthog/core/save-mode/saveMode"; +import { + Button, + DropdownMenu, + DropdownMenuContent, + DropdownMenuRadioGroup, + DropdownMenuRadioItem, + DropdownMenuTrigger, + MenuLabel, + Tooltip, + TooltipContent, + TooltipProvider, + TooltipTrigger, +} from "@posthog/quill"; +import { ANALYTICS_EVENTS } from "@posthog/shared"; +import { useSettingsStore } from "@posthog/ui/features/settings/settingsStore"; +import { track } from "@posthog/ui/shell/analytics"; +import { useApplySaveMode } from "../hooks/useApplySaveMode"; + +const TRIGGER_LABEL: Record = { + off: "Off", + balanced: "Balanced", + max_save: "Max savings", +}; + +const SAVE_MODE_VS_EFFORT = ( + <> + Effort controls thinking depth. Save Mode{" "} + cuts cost: Balanced caps effort and shortens replies;{" "} + Max savings also switches to a cheaper model. + +); + +export function SaveModeToggle({ + taskId, + disabled, +}: { + taskId?: string; + disabled?: boolean; +}) { + const saveMode = useSettingsStore((s) => s.saveMode); + const setSaveMode = useSettingsStore((s) => s.setSaveMode); + const applyToSession = useApplySaveMode(taskId); + const active = saveMode !== "off"; + + const handleChange = (value: string) => { + const nextMode = value as SaveMode; + const prevMode = saveMode; + setSaveMode(nextMode); + applyToSession(nextMode); + track(ANALYTICS_EVENTS.SAVE_MODE_CHANGED, { + new_mode: nextMode, + old_mode: prevMode, + context: taskId ? "session" : "new-task", + task_id: taskId, + }); + }; + + return ( +
+ + + + + {TRIGGER_LABEL[saveMode]} + + + + } + /> + + Save Mode + + Off + + Balanced + + + Maximum savings + + + + + + + + + + } + /> + + {SAVE_MODE_VS_EFFORT} + + + +
+ ); +} diff --git a/packages/ui/src/features/sessions/components/SessionView.tsx b/packages/ui/src/features/sessions/components/SessionView.tsx index 88cf60b9e..5b587665b 100644 --- a/packages/ui/src/features/sessions/components/SessionView.tsx +++ b/packages/ui/src/features/sessions/components/SessionView.tsx @@ -29,6 +29,7 @@ import { PlanStatusBar } from "@posthog/ui/features/sessions/components/PlanStat import { QueuedMessagesDock } from "@posthog/ui/features/sessions/components/QueuedMessagesDock"; import { ReasoningLevelSelector } from "@posthog/ui/features/sessions/components/ReasoningLevelSelector"; import { RawLogsView } from "@posthog/ui/features/sessions/components/raw-logs/RawLogsView"; +import { SaveModeToggle } from "@posthog/ui/features/sessions/components/SaveModeToggle"; import { SessionResourcesBar } from "@posthog/ui/features/sessions/components/SessionResourcesBar"; import { SteerQueueToggle } from "@posthog/ui/features/sessions/components/SteerQueueToggle"; import { CHAT_CONTENT_MAX_WIDTH } from "@posthog/ui/features/sessions/constants"; @@ -640,14 +641,17 @@ export function SessionView({ /> } reasoningSelector={ - thoughtOption ? ( - - ) : null + <> + {thoughtOption ? ( + + ) : null} + + } messagingModeToggle={ taskId ? ( diff --git a/packages/ui/src/features/sessions/hooks/useApplySaveMode.ts b/packages/ui/src/features/sessions/hooks/useApplySaveMode.ts new file mode 100644 index 000000000..fb2d0fa79 --- /dev/null +++ b/packages/ui/src/features/sessions/hooks/useApplySaveMode.ts @@ -0,0 +1,23 @@ +import type { SaveMode } from "@posthog/core/save-mode/saveMode"; +import type { SessionService } from "@posthog/core/sessions/sessionService"; +import { SESSION_SERVICE } from "@posthog/core/sessions/sessionService"; +import { useService } from "@posthog/di/react"; +import { toast } from "@posthog/ui/primitives/toast"; + +export function useApplySaveMode(taskId: string | undefined) { + const sessionService = useService(SESSION_SERVICE); + + return (nextMode: SaveMode) => { + if (!taskId) return; + void sessionService.applySaveMode(taskId, nextMode).then((changed) => { + if (changed) { + toast.success("Save mode applied to this session", { + description: + nextMode === "off" + ? "Restored original model and effort settings" + : "Model and effort adjusted · switching re-processes the conversation once", + }); + } + }); + }; +} diff --git a/packages/ui/src/features/settings/sections/GeneralSettings.tsx b/packages/ui/src/features/settings/sections/GeneralSettings.tsx index 5f0645893..596182ed9 100644 --- a/packages/ui/src/features/settings/sections/GeneralSettings.tsx +++ b/packages/ui/src/features/settings/sections/GeneralSettings.tsx @@ -1,4 +1,5 @@ import { ArrowSquareOut } from "@phosphor-icons/react"; +import type { SaveMode } from "@posthog/core/save-mode/saveMode"; import { buildPostHogUrl } from "@posthog/core/settings/posthogUrl"; import { useHostTRPC } from "@posthog/host-router/react"; import { ANALYTICS_EVENTS } from "@posthog/shared"; @@ -85,6 +86,7 @@ export function GeneralSettings() { defaultInitialTaskMode, defaultMessagingMode, defaultReasoningEffort, + saveMode, diffOpenMode, sendMessagesWith, conversationCollapseMode, @@ -99,6 +101,7 @@ export function GeneralSettings() { setDefaultInitialTaskMode, setDefaultMessagingMode, setDefaultReasoningEffort, + setSaveMode, setDiffOpenMode, setSendMessagesWith, setConversationCollapseMode, @@ -492,6 +495,24 @@ export function GeneralSettings() { + + setSaveMode(value as SaveMode)} + size="1" + > + + + Off + Balanced + Maximum savings + + + + void; markHintLearned: (key: string) => void; + // Save mode + saveMode: SaveMode; + setSaveMode: (mode: SaveMode) => void; + _hasHydrated: boolean; setHasHydrated: (hydrated: boolean) => void; } @@ -271,6 +276,10 @@ export const useSettingsStore = create()( setMcpAppsDisabledServers: (servers) => set({ mcpAppsDisabledServers: servers }), + // Save mode + saveMode: "off", + setSaveMode: (mode) => set({ saveMode: mode }), + // Onboarding hints hints: {}, shouldShowHint: (key, max = 3) => { @@ -355,6 +364,9 @@ export const useSettingsStore = create()( slotMachineMode: state.slotMachineMode, mcpAppsDisabledServers: state.mcpAppsDisabledServers, + // Save mode + saveMode: state.saveMode, + // Onboarding hints hints: state.hints, }), diff --git a/packages/ui/src/features/task-detail/components/TaskInput.tsx b/packages/ui/src/features/task-detail/components/TaskInput.tsx index 93631a781..88a4e2bf0 100644 --- a/packages/ui/src/features/task-detail/components/TaskInput.tsx +++ b/packages/ui/src/features/task-detail/components/TaskInput.tsx @@ -47,6 +47,7 @@ import { useAutoFocusOnTyping } from "../../message-editor/useAutoFocusOnTyping" import { resolveAndAttachDroppedFiles } from "../../message-editor/utils/persistFile"; import { DropZoneOverlay } from "../../sessions/components/DropZoneOverlay"; import { ReasoningLevelSelector } from "../../sessions/components/ReasoningLevelSelector"; +import { SaveModeToggle } from "../../sessions/components/SaveModeToggle"; import { UnifiedModelSelector } from "../../sessions/components/UnifiedModelSelector"; import { getCurrentModeFromConfigOptions } from "../../sessions/sessionStore"; import { @@ -922,14 +923,17 @@ export function TaskInput({ /> } reasoningSelector={ - !isPreviewLoading && ( - - ) + <> + {!isPreviewLoading && ( + + )} + + } getPromptHistory={getPromptHistory} onEmptyChange={handleEditorEmptyChange} diff --git a/packages/ui/src/features/task-detail/hooks/useTaskCreation.ts b/packages/ui/src/features/task-detail/hooks/useTaskCreation.ts index 1af3a833c..0c9a8548d 100644 --- a/packages/ui/src/features/task-detail/hooks/useTaskCreation.ts +++ b/packages/ui/src/features/task-detail/hooks/useTaskCreation.ts @@ -1,3 +1,7 @@ +import { + resolveSaveMode, + type SaveMode, +} from "@posthog/core/save-mode/saveMode"; import { getErrorTitle, prepareTaskInput, @@ -89,6 +93,7 @@ async function trackTaskCreated( input: TaskCreationInput, selectedDirectory: string, hostClient: HostTrpcClient, + saveMode?: SaveMode, ): Promise { try { const workspaceMode = input.workspaceMode ?? "local"; @@ -129,6 +134,7 @@ async function trackTaskCreated( uses_worktree_link: usesWorktreeLink, uses_worktree_include: usesWorktreeInclude, adapter: input.adapter, + save_mode: saveMode, }); } catch (error) { log.warn("Failed to track Task created event", { error }); @@ -181,6 +187,7 @@ export function useTaskCreation({ const { isOnline } = useConnectivity(); // Used to name the task occupying a branch's worktree when reuse is blocked. const { data: tasks } = useTasks(); + const saveMode = useSettingsStore((s) => s.saveMode); const hasRequiredPath = allowNoRepo ? true @@ -285,6 +292,20 @@ export function useTaskCreation({ } } + const saveModeResult = + workspaceMode !== "cloud" + ? resolveSaveMode({ + mode: saveMode, + requestedModel: model ?? "", + requestedEffort: + (reasoningLevel as + | "low" + | "medium" + | "high" + | "xhigh" + | "max") ?? "medium", + }) + : null; const input = prepareTaskInput(serializedContent, filePaths, { // In channels chat-box mode no repo is attached up front, even if a // directory/repo is lingering in the persisted picker state. @@ -298,8 +319,8 @@ export function useTaskCreation({ reuseExistingWorktree, executionMode, adapter, - model, - reasoningLevel, + model: saveModeResult?.model ?? model, + reasoningLevel: saveModeResult?.effort ?? reasoningLevel, environmentId, sandboxEnvironmentId, signalReportId, @@ -307,6 +328,7 @@ export function useTaskCreation({ channelContext, channelName, allowNoRepo, + systemPromptOverride: saveModeResult?.systemReminder ?? undefined, }); if (executionMode) { @@ -351,7 +373,7 @@ export function useTaskCreation({ if (!contentOverride) { useDraftStore.getState().actions.setDraft(sessionId, null); } - void trackTaskCreated(input, selectedDirectory, hostClient); + void trackTaskCreated(input, selectedDirectory, hostClient, saveMode); // Repo-less channel tasks create no workspace row (the agent runs in // a scratch dir surfaced as a synthetic workspace), so the normal // workspace.create invalidation never fires. Refresh the workspace @@ -426,6 +448,7 @@ export function useTaskCreation({ queryClient, taskService, tasks, + saveMode, ], ); diff --git a/packages/workspace-server/src/services/agent/agent.ts b/packages/workspace-server/src/services/agent/agent.ts index d8ee377ef..73081f87b 100644 --- a/packages/workspace-server/src/services/agent/agent.ts +++ b/packages/workspace-server/src/services/agent/agent.ts @@ -274,6 +274,8 @@ interface SessionConfig { model?: string; /** JSON Schema for structured task output — when set, the agent gets a create_output tool */ jsonSchema?: Record | null; + /** Newline-delimited x-posthog-property-* headers for save-mode telemetry. */ + saveModeHeaders?: string; } interface ManagedSession { @@ -674,6 +676,7 @@ If a repository IS genuinely required, attach one in this priority order: effort, model, jsonSchema, + saveModeHeaders, } = config; // Preview config doesn't need a real repo — use a temp directory @@ -954,6 +957,7 @@ If a repository IS genuinely required, attach one in this priority order: ...(permissionMode && { permissionMode }), ...(model != null && { model }), ...(jsonSchema && { jsonSchema }), + ...(saveModeHeaders && { saveModeHeaders }), claudeCode: { options: claudeCodeOptions, }, @@ -980,6 +984,7 @@ If a repository IS genuinely required, attach one in this priority order: ...(permissionMode && { permissionMode }), ...(model != null && { model }), ...(jsonSchema && { jsonSchema }), + ...(saveModeHeaders && { saveModeHeaders }), claudeCode: { options: claudeCodeOptions, }, @@ -1863,6 +1868,8 @@ For git operations while detached: effort: "effort" in params ? params.effort : undefined, model: "model" in params ? params.model : undefined, jsonSchema: "jsonSchema" in params ? params.jsonSchema : undefined, + saveModeHeaders: + "saveModeHeaders" in params ? params.saveModeHeaders : undefined, }; } diff --git a/packages/workspace-server/src/services/agent/schemas.ts b/packages/workspace-server/src/services/agent/schemas.ts index 5cfb8cd01..825346c73 100644 --- a/packages/workspace-server/src/services/agent/schemas.ts +++ b/packages/workspace-server/src/services/agent/schemas.ts @@ -63,6 +63,8 @@ export const startSessionInput = z.object({ effort: effortLevelSchema.optional(), model: z.string().optional(), jsonSchema: z.record(z.string(), z.unknown()).nullish(), + /** Newline-delimited x-posthog-property-* lines for save-mode telemetry. */ + saveModeHeaders: z.string().optional(), }); export type StartSessionInput = z.infer; diff --git a/scripts/dev-cost-controls.sh b/scripts/dev-cost-controls.sh new file mode 100755 index 000000000..87b68efce --- /dev/null +++ b/scripts/dev-cost-controls.sh @@ -0,0 +1,87 @@ +#!/usr/bin/env bash +# dev-cost-controls.sh — run & verify the Save Mode / Cost Controls work locally. +# +# Subcommands: +# check Preflight: venv, ANTHROPIC_API_KEY, local PostHog (:8010), port :3308 +# test Run BE (gateway) + FE (core) unit tests + lints [no stack needed] +# gateway Start the LLM gateway on :3308 against local PostHog (cache-TTL wire is live) +# help This message +# +# Repos are assumed siblings; override with env vars: +# POSTHOG_DIR (default ~/Developer/posthog) CODE_DIR (default ~/Developer/code) +# +# Full flow (4 terminals, in order): +# 1) cd ~/Developer/posthog && hogli dev:setup # pick: Postgres, Redis, ClickHouse, Django/API +# then: hogli start # (first run also: hogli dev:reset) +# 2) ./scripts/dev-cost-controls.sh test # confirm everything is green +# 3) ANTHROPIC_API_KEY=sk-ant-... ./scripts/dev-cost-controls.sh gateway +# 4) cd ~/Developer/code && pnpm dev # then pick the "Dev" region at login +set -euo pipefail + +POSTHOG_DIR="${POSTHOG_DIR:-$HOME/Developer/posthog}" +CODE_DIR="${CODE_DIR:-$HOME/Developer/code}" +GW_DIR="$POSTHOG_DIR/services/llm-gateway" +PY="$GW_DIR/.venv/bin/python" + +c_ok() { printf ' \033[32m✓\033[0m %s\n' "$1"; } +c_warn() { printf ' \033[33m!\033[0m %s\n' "$1"; } +c_err() { printf ' \033[31m✗\033[0m %s\n' "$1"; } +hr() { printf '\n=== %s ===\n' "$1"; } + +cmd_check() { + hr "Preflight" + [ -x "$PY" ] && c_ok "gateway .venv present" || c_err "no gateway .venv — run: cd $GW_DIR && uv sync" + [ -n "${ANTHROPIC_API_KEY:-}" ] && c_ok "ANTHROPIC_API_KEY set" || c_warn "ANTHROPIC_API_KEY unset (gateway can't proxy)" + if curl -fsS -m 2 http://localhost:8010/_health >/dev/null 2>&1 || curl -fsS -m 2 http://localhost:8010 >/dev/null 2>&1; then + c_ok "local PostHog reachable on :8010" + else + c_warn "local PostHog NOT reachable on :8010 — start it: cd $POSTHOG_DIR && hogli start" + fi + if nc -z -w1 localhost 3308 >/dev/null 2>&1; then + c_warn "port :3308 already in use (gateway may already be running)" + else + c_ok "port :3308 free" + fi +} + +cmd_test() { + hr "BE — gateway unit tests + lints" + cd "$GW_DIR" + "$PY" -m pytest tests/test_cost_efficiency.py tests/test_batch_routing.py \ + tests/test_budget_guard.py tests/test_cost_controls.py tests/test_cache_ttl.py -q + .venv/bin/ruff check src/llm_gateway/cost_efficiency.py src/llm_gateway/batch_routing.py \ + src/llm_gateway/budget_guard.py src/llm_gateway/cost_controls.py src/llm_gateway/cache_ttl.py \ + src/llm_gateway/api/anthropic.py + MYPYPATH=src .venv/bin/mypy src/llm_gateway/cost_efficiency.py src/llm_gateway/batch_routing.py \ + src/llm_gateway/budget_guard.py src/llm_gateway/cost_controls.py src/llm_gateway/cache_ttl.py + c_ok "BE green" + + hr "FE — core unit tests + lint" + cd "$CODE_DIR" + pnpm --filter @posthog/core exec vitest run src/save-mode + node_modules/.bin/biome check packages/core/src/save-mode + c_ok "FE green" +} + +cmd_gateway() { + cmd_check + hr "Starting gateway on :3308 (cache-TTL wire live, alpha-gated)" + cd "$GW_DIR" + export DATABASE_URL="${DATABASE_URL:-postgres://posthog:posthog@localhost:5432/posthog}" + export REDIS_URL="${REDIS_URL:-redis://localhost:6379}" + export POSTHOG_HOST="${POSTHOG_HOST:-http://localhost:8010}" + export POSTHOG_API_BASE_URL="${POSTHOG_API_BASE_URL:-http://localhost:8010}" + : "${ANTHROPIC_API_KEY:?set ANTHROPIC_API_KEY before running 'gateway'}" + if command -v uv >/dev/null 2>&1; then + exec uv run uvicorn llm_gateway.main:app --reload --port 3308 + else + exec "$PY" -m uvicorn llm_gateway.main:app --reload --port 3308 + fi +} + +case "${1:-help}" in + check) cmd_check ;; + test) cmd_test ;; + gateway) cmd_gateway ;; + *) sed -n '2,25p' "$0" ;; +esac