From 076973655b473e6aebe421e460b189e0baf8cea8 Mon Sep 17 00:00:00 2001 From: Shrey Pandya Date: Fri, 14 Nov 2025 14:21:50 -0800 Subject: [PATCH 1/2] print stagehand metrics --- .cursor/plans/stage-d93ad3e4.plan.md | 116 +++++++++++++++++++++++++++ src/mcp/usage.ts | 115 ++++++++++++++++++++++++++ src/tools/act.ts | 8 ++ src/tools/agent.ts | 8 ++ src/tools/extract.ts | 8 ++ src/tools/index.ts | 3 + src/tools/navigate.ts | 8 ++ src/tools/observe.ts | 8 ++ src/tools/url.ts | 8 ++ src/tools/usage.ts | 102 +++++++++++++++++++++++ 10 files changed, 384 insertions(+) create mode 100644 .cursor/plans/stage-d93ad3e4.plan.md create mode 100644 src/mcp/usage.ts create mode 100644 src/tools/usage.ts diff --git a/.cursor/plans/stage-d93ad3e4.plan.md b/.cursor/plans/stage-d93ad3e4.plan.md new file mode 100644 index 0000000..51e3d90 --- /dev/null +++ b/.cursor/plans/stage-d93ad3e4.plan.md @@ -0,0 +1,116 @@ + + +# Plan: Integrate Gemini usage metadata into Stagehand MCP usage stats + +## Overview + +We will update the existing Stagehand usage tracking in the Browserbase MCP server so that, when Gemini is the backing model, it records **exact token usage** from Gemini's `usage_metadata` instead of local estimates. These richer metrics will be surfaced via the existing `browserbase_usage_stats` tool in a shape similar to Claude Agent SDK's `message.usage` (tokens + cost). + +## Key facts from Gemini API + +- Gemini responses expose **usage metadata** on each call, e.g. `usage_metadata` / `usageMetadata` with: +- `prompt_token_count` / `promptTokenCount` +- `candidates_token_count` / `candidatesTokenCount` +- `total_token_count` / `totalTokenCount` +- We should rely on this **post-request usage metadata** for accurate accounting rather than local token counting. +- Cost is not returned directly by Gemini but can be derived from published pricing using the token counts; we will compute `total_cost_usd` from those counts and a configured price table for the Gemini model in use. + +## Design decisions + +- **Scope**: Only track detailed token metrics when the underlying LLM is **Gemini via the official API**; for other models we keep the current call-count-only behavior. +- **Metrics shape**: For each operation we will aggregate fields that mirror Claude Agent SDK usage: +- `input_tokens` (mapped from Gemini `prompt_token_count`) +- `output_tokens` (mapped from Gemini `candidates_token_count`) +- `cache_read_input_tokens` (0 for now, unless Gemini exposes cache reads explicitly) +- `cache_creation_input_tokens` (0 for now, unless Gemini exposes cache creation explicitly) +- `total_cost_usd` (derived from `input_tokens`/`output_tokens` and a simple pricing table keyed by model name). +- **Where to plug in**: Prefer to capture usage **at the point where the Gemini client is called**. If Stagehand exposes `usage_metadata` on its result, use that; otherwise, configure Stagehand's model to use a **wrapped Gemini client** that returns both the normal result and usage metadata for accounting. +- **Configuration**: Add an internal mapping of Gemini model names to per-token prices, so cost computation is centralized and easy to update. + +## Implementation steps + +### 1. Extend the usage tracker types and API + +1. Update `src/mcp/usage.ts`: + +- Introduce a `StagehandUsageMetrics` type with: +- `inputTokens`, `outputTokens`, `cacheReadInputTokens`, `cacheCreationInputTokens`, `totalCostUsd` (all numbers, defaulting to 0 when absent). +- Extend `StagehandOperationStats` to include aggregated fields: +- `inputTokens`, `outputTokens`, `cacheReadInputTokens`, `cacheCreationInputTokens`, `totalCostUsd`. +- Change `recordStagehandCall` to accept an optional `metrics?: StagehandUsageMetrics` argument and, when present, add those values to both the **global** and **per-session** aggregates. +- Keep the existing call-count and `toolCallCounts` logic intact. + +### 2. Add a Gemini pricing helper for cost computation + +1. Create a small helper (new module or inside `usage.ts`) that: + +- Defines a mapping from Gemini model names used in this project (e.g. `"gemini-2.0-flash"`, `"google/gemini-2.5-computer-use-preview-10-2025"`) to per-token prices for input and output tokens. +- Exposes a function `computeGeminiCostUsd({ modelName, inputTokens, outputTokens }): number` that multiplies token counts by the configured prices and returns `total_cost_usd`. + +2. This helper should be easy to update if pricing changes, but it should not make any external API calls at runtime. + +### 3. Determine how to access Gemini usage metadata through Stagehand + +1. Review Stagehand documentation / types (and, if available in the codebase, its usage patterns) to confirm whether it exposes Gemini `usage_metadata` on: + +- The return value of the operations we already call (`agent.execute`, `extract`, `observe`, etc.), or +- Some internal hook/callback or logging mechanism. + +2. If Stagehand surfaces `usage_metadata` directly on results (e.g. `result.usage_metadata` or similar): + +- For each Stagehand-using tool, extract `prompt_token_count` and `candidates_token_count` from the result. + +3. If Stagehand does **not** surface usage metadata, adapt the Stagehand configuration in `sessionManager.ts` or `createStagehandInstance` so that it uses a **wrapped Gemini client** that: + +- Calls the real Gemini client. +- Reads `response.usage_metadata`. +- Returns the normal content to Stagehand but also provides usage metadata in a way our tools can read (e.g. attached to the result, or via a shared metrics callback that calls `recordStagehandCall`). + +### 4. Feed Gemini metrics into recordStagehandCall + +1. For each Stagehand-backed tool (`agent.ts`, `extract.ts`, `observe.ts`, `act.ts`, `navigate.ts`, `screenshot.ts`, `url.ts`): + +- After the Gemini-driven Stagehand call completes and the result is available, derive a `StagehandUsageMetrics` object when **and only when** Gemini usage metadata is available. +- Map fields as: +- `inputTokens = usage_metadata.prompt_token_count` +- `outputTokens = usage_metadata.candidates_token_count` +- `cacheReadInputTokens = 0` (until caching semantics are exposed) +- `cacheCreationInputTokens = 0` +- `totalCostUsd = computeGeminiCostUsd({ modelName, inputTokens, outputTokens })` +- Call `recordStagehandCall({ sessionId, toolName, operation }, metrics)` instead of the current call-count-only signature. + +2. If usage metadata is not available for a particular call (e.g., non-Gemini model or legacy code path), call `recordStagehandCall` without the `metrics` argument so the system still tracks call counts. + +### 5. Adjust the usage stats tool output format + +1. Update `src/tools/usage.ts` to: + +- Return the new numeric fields (input/output tokens, cache\_\* tokens, total_cost_usd) as part of each operation’s JSON object. +- Optionally adjust the JSON keys to snake_case (`input_tokens`, `output_tokens`, etc.) to mirror Claude Agent SDK’s `message.usage` naming. + +2. Keep the existing `scope` (`global`/`perSession`/`all`), `sessionId`, and `reset` behavior unchanged so existing integrations continue to work. + +### 6. Documentation updates + +1. Update the README section for **Stagehand usage metrics** to: + +- Mention that, when using Gemini as the model, `browserbase_usage_stats` returns: +- `input_tokens`, `output_tokens`, `total_cost_usd`, and placeholder `cache_*` fields (currently zero unless Gemini exposes explicit cache usage). +- Clarify that these token counts come directly from Gemini’s `usage_metadata` and that cost is derived from them using a simple internal price table. + +2. Optionally add a short example snippet showing how a Claude Agent SDK-based agent can: + +- Call `browserbase_usage_stats` at the end of a run. +- Read `total_cost_usd` and token counts for reporting alongside Claude’s own `message.usage`. + +## Notes and future extensions + +- If Stagehand later adds first-class support for exposing underlying model usage/cost, we can simplify our wrapping logic and rely on that instead. +- If Gemini introduces direct cost reporting in responses, we can remove local pricing tables and use the API’s `total_cost_usd` directly, simplifying `computeGeminiCostUsd`. + +### To-dos + +- [ ] Create shared in-memory usage tracker in src/mcp/usage.ts with record/get/reset functions and appropriate types. +- [ ] Import and call the usage tracker from all Stagehand-using tools (agent, act, navigate, observe, extract, screenshot, url) to record each Stagehand operation with session and tool info. +- [ ] Add a new MCP tool browserbase_usage_stats that returns a snapshot of usage metrics via MCP call_tool, and register it in the tools index. +- [ ] Update README and any relevant Agent SDK integration examples to show how to call the usage stats tool and interpret its output. diff --git a/src/mcp/usage.ts b/src/mcp/usage.ts new file mode 100644 index 0000000..9b2b9bf --- /dev/null +++ b/src/mcp/usage.ts @@ -0,0 +1,115 @@ +import type { Stagehand } from "@browserbasehq/stagehand"; + +export type StagehandUsageOperation = string; + +export type StagehandUsageKey = { + sessionId: string; + toolName: string; + operation: StagehandUsageOperation; +}; + +export type StagehandOperationStats = { + callCount: number; + toolCallCounts: Record; +}; + +export type StagehandSessionUsage = { + operations: Record; +}; + +export type StagehandUsageSnapshot = { + global: Record; + perSession: Record; +}; + +const globalUsage: Record = {}; +const perSessionUsage: Record = {}; + +function getOrCreateOperationStats( + container: Record, + operation: StagehandUsageOperation, +): StagehandOperationStats { + if (!container[operation]) { + container[operation] = { + callCount: 0, + toolCallCounts: {}, + }; + } + return container[operation]; +} + +async function logStagehandMetrics( + stagehand: Stagehand | undefined, + key: StagehandUsageKey, +): Promise { + if (!stagehand) return; + + + const rawMetrics: any = (stagehand as any).metrics; + const metrics = + rawMetrics && typeof rawMetrics.then === "function" + ? await rawMetrics + : rawMetrics; + + if (!metrics) return; + + // Keep this as a structured JSON line so it’s easy to grep/pipe elsewhere. + + console.log( + JSON.stringify( + { + source: "stagehand-mcp", + event: "stagehand_metrics", + ...key, + metrics, + }, + null, + 2, + ), + ); +} + +export async function recordStagehandCall( + args: StagehandUsageKey & { stagehand?: Stagehand }, +): Promise { + const { sessionId, toolName, operation, stagehand } = args; + + // Update global aggregate + const globalStats = getOrCreateOperationStats(globalUsage, operation); + globalStats.callCount += 1; + globalStats.toolCallCounts[toolName] = + (globalStats.toolCallCounts[toolName] ?? 0) + 1; + + // Update per-session usage + if (!perSessionUsage[sessionId]) { + perSessionUsage[sessionId] = { operations: {} }; + } + + const sessionStats = getOrCreateOperationStats( + perSessionUsage[sessionId].operations, + operation, + ); + sessionStats.callCount += 1; + sessionStats.toolCallCounts[toolName] = + (sessionStats.toolCallCounts[toolName] ?? 0) + 1; + + await logStagehandMetrics(stagehand, { sessionId, toolName, operation }); +} + +export function getUsageSnapshot(): StagehandUsageSnapshot { + return { + global: globalUsage, + perSession: perSessionUsage, + }; +} + +export function resetUsage(): void { + for (const key of Object.keys(globalUsage)) { + + delete globalUsage[key]; + } + for (const key of Object.keys(perSessionUsage)) { + + delete perSessionUsage[key]; + } +} diff --git a/src/tools/act.ts b/src/tools/act.ts index d395a66..2d22102 100644 --- a/src/tools/act.ts +++ b/src/tools/act.ts @@ -2,6 +2,7 @@ import { z } from "zod"; import type { Tool, ToolSchema, ToolResult } from "./tool.js"; import type { Context } from "../context.js"; import type { ToolActionResult } from "../types/types.js"; +import { recordStagehandCall } from "../mcp/usage.js"; /** * Stagehand Act @@ -45,6 +46,13 @@ async function handleAct( variables: params.variables, }); + await recordStagehandCall({ + sessionId: context.currentSessionId, + toolName: actSchema.name, + operation: "act", + stagehand, + }); + return { content: [ { diff --git a/src/tools/agent.ts b/src/tools/agent.ts index e333079..2a62313 100644 --- a/src/tools/agent.ts +++ b/src/tools/agent.ts @@ -2,6 +2,7 @@ import { z } from "zod"; import type { Tool, ToolSchema, ToolResult } from "./tool.js"; import type { Context } from "../context.js"; import type { ToolActionResult } from "../types/types.js"; +import { recordStagehandCall } from "../mcp/usage.js"; /** * Stagehand Agent @@ -54,6 +55,13 @@ async function handleAgent( maxSteps: 20, }); + await recordStagehandCall({ + sessionId: context.currentSessionId, + toolName: agentSchema.name, + operation: "agent.execute", + stagehand, + }); + return { content: [ { diff --git a/src/tools/extract.ts b/src/tools/extract.ts index 7bfb56b..2c37942 100644 --- a/src/tools/extract.ts +++ b/src/tools/extract.ts @@ -2,6 +2,7 @@ import { z } from "zod"; import type { Tool, ToolSchema, ToolResult } from "./tool.js"; import type { Context } from "../context.js"; import type { ToolActionResult } from "../types/types.js"; +import { recordStagehandCall } from "../mcp/usage.js"; /** * Stagehand Extract @@ -39,6 +40,13 @@ async function handleExtract( const extraction = await stagehand.extract(params.instruction); + await recordStagehandCall({ + sessionId: context.currentSessionId, + toolName: extractSchema.name, + operation: "extract", + stagehand, + }); + return { content: [ { diff --git a/src/tools/index.ts b/src/tools/index.ts index f0a19da..5ed63f6 100644 --- a/src/tools/index.ts +++ b/src/tools/index.ts @@ -6,6 +6,7 @@ import screenshotTool from "./screenshot.js"; import sessionTools from "./session.js"; import getUrlTool from "./url.js"; import agentTool from "./agent.js"; +import usageTool from "./usage.js"; // Export individual tools export { default as navigateTool } from "./navigate.js"; @@ -16,6 +17,7 @@ export { default as screenshotTool } from "./screenshot.js"; export { default as sessionTools } from "./session.js"; export { default as getUrlTool } from "./url.js"; export { default as agentTool } from "./agent.js"; +export { default as usageTool } from "./usage.js"; // Export all tools as array export const TOOLS = [ @@ -27,6 +29,7 @@ export const TOOLS = [ screenshotTool, getUrlTool, agentTool, + usageTool, ]; export const sessionManagementTools = sessionTools; diff --git a/src/tools/navigate.ts b/src/tools/navigate.ts index c6309a0..11ece5f 100644 --- a/src/tools/navigate.ts +++ b/src/tools/navigate.ts @@ -2,6 +2,7 @@ import { z } from "zod"; import type { Tool, ToolSchema, ToolResult } from "./tool.js"; import type { Context } from "../context.js"; import type { ToolActionResult } from "../types/types.js"; +import { recordStagehandCall } from "../mcp/usage.js"; const NavigateInputSchema = z.object({ url: z.string().describe("The URL to navigate to"), @@ -37,6 +38,13 @@ async function handleNavigate( throw new Error("No Browserbase session ID available"); } + await recordStagehandCall({ + sessionId: context.currentSessionId, + toolName: navigateSchema.name, + operation: "navigate.goto", + stagehand, + }); + return { content: [ { diff --git a/src/tools/observe.ts b/src/tools/observe.ts index b473300..9905b73 100644 --- a/src/tools/observe.ts +++ b/src/tools/observe.ts @@ -2,6 +2,7 @@ import { z } from "zod"; import type { Tool, ToolSchema, ToolResult } from "./tool.js"; import type { Context } from "../context.js"; import type { ToolActionResult } from "../types/types.js"; +import { recordStagehandCall } from "../mcp/usage.js"; /** * Stagehand Observe @@ -42,6 +43,13 @@ async function handleObserve( const observations = await stagehand.observe(params.instruction); + await recordStagehandCall({ + sessionId: context.currentSessionId, + toolName: observeSchema.name, + operation: "observe", + stagehand, + }); + return { content: [ { diff --git a/src/tools/url.ts b/src/tools/url.ts index da7e124..f73a296 100644 --- a/src/tools/url.ts +++ b/src/tools/url.ts @@ -2,6 +2,7 @@ import { z } from "zod"; import type { Tool, ToolSchema, ToolResult } from "./tool.js"; import type { Context } from "../context.js"; import type { ToolActionResult } from "../types/types.js"; +import { recordStagehandCall } from "../mcp/usage.js"; /** * Stagehand Get URL @@ -37,6 +38,13 @@ async function handleGetUrl( const currentUrl = page.url(); + await recordStagehandCall({ + sessionId: context.currentSessionId, + toolName: getUrlSchema.name, + operation: "get_url", + stagehand, + }); + return { content: [ { diff --git a/src/tools/usage.ts b/src/tools/usage.ts new file mode 100644 index 0000000..0f6e37b --- /dev/null +++ b/src/tools/usage.ts @@ -0,0 +1,102 @@ +import { z } from "zod"; +import type { Tool, ToolSchema, ToolResult } from "./tool.js"; +import type { Context } from "../context.js"; +import type { ToolActionResult } from "../types/types.js"; +import { getUsageSnapshot, resetUsage } from "../mcp/usage.js"; + +const UsageInputSchema = z + .object({ + sessionId: z + .string() + .optional() + .describe( + "Optional: filter per-session stats to a specific internal MCP session ID.", + ), + scope: z + .enum(["global", "perSession", "all"]) + .optional() + .describe( + 'Optional: which portion of the snapshot to return: "global", "perSession", or "all" (default).', + ), + reset: z + .boolean() + .optional() + .describe( + "Optional: when true, reset accumulated usage counters after returning the snapshot.", + ), + }) + .optional() + .default({}); + +type UsageInput = z.infer; + +const usageSchema: ToolSchema = { + name: "browserbase_usage_stats", + description: + "Return a snapshot of Stagehand usage metrics (call counts) for this MCP process, optionally filtered by session.", + inputSchema: UsageInputSchema, +}; + +async function handleUsage( + + context: Context, + params: UsageInput, +): Promise { + const action = async (): Promise => { + const snapshot = getUsageSnapshot(); + + const scope = params.scope ?? "all"; + let result: unknown = snapshot; + + if (scope === "global") { + result = { global: snapshot.global }; + } else if (scope === "perSession") { + if (params.sessionId) { + result = { + perSession: { + [params.sessionId]: snapshot.perSession[params.sessionId] ?? { + operations: {}, + }, + }, + }; + } else { + result = { perSession: snapshot.perSession }; + } + } else if (scope === "all" && params.sessionId) { + result = { + global: snapshot.global, + perSession: { + [params.sessionId]: snapshot.perSession[params.sessionId] ?? { + operations: {}, + }, + }, + }; + } + + if (params.reset) { + resetUsage(); + } + + return { + content: [ + { + type: "text", + text: JSON.stringify(result, null, 2), + }, + ], + }; + }; + + return { + action, + waitForNetwork: false, + }; +} + +const usageTool: Tool = { + capability: "core", + schema: usageSchema, + handle: handleUsage, +}; + +export default usageTool; From bafd75bae8b3523cc91c6e476839dde86ab6b025 Mon Sep 17 00:00:00 2001 From: Shrey Pandya Date: Fri, 14 Nov 2025 14:22:31 -0800 Subject: [PATCH 2/2] remove cursor plan --- .cursor/plans/stage-d93ad3e4.plan.md | 116 --------------------------- 1 file changed, 116 deletions(-) delete mode 100644 .cursor/plans/stage-d93ad3e4.plan.md diff --git a/.cursor/plans/stage-d93ad3e4.plan.md b/.cursor/plans/stage-d93ad3e4.plan.md deleted file mode 100644 index 51e3d90..0000000 --- a/.cursor/plans/stage-d93ad3e4.plan.md +++ /dev/null @@ -1,116 +0,0 @@ - - -# Plan: Integrate Gemini usage metadata into Stagehand MCP usage stats - -## Overview - -We will update the existing Stagehand usage tracking in the Browserbase MCP server so that, when Gemini is the backing model, it records **exact token usage** from Gemini's `usage_metadata` instead of local estimates. These richer metrics will be surfaced via the existing `browserbase_usage_stats` tool in a shape similar to Claude Agent SDK's `message.usage` (tokens + cost). - -## Key facts from Gemini API - -- Gemini responses expose **usage metadata** on each call, e.g. `usage_metadata` / `usageMetadata` with: -- `prompt_token_count` / `promptTokenCount` -- `candidates_token_count` / `candidatesTokenCount` -- `total_token_count` / `totalTokenCount` -- We should rely on this **post-request usage metadata** for accurate accounting rather than local token counting. -- Cost is not returned directly by Gemini but can be derived from published pricing using the token counts; we will compute `total_cost_usd` from those counts and a configured price table for the Gemini model in use. - -## Design decisions - -- **Scope**: Only track detailed token metrics when the underlying LLM is **Gemini via the official API**; for other models we keep the current call-count-only behavior. -- **Metrics shape**: For each operation we will aggregate fields that mirror Claude Agent SDK usage: -- `input_tokens` (mapped from Gemini `prompt_token_count`) -- `output_tokens` (mapped from Gemini `candidates_token_count`) -- `cache_read_input_tokens` (0 for now, unless Gemini exposes cache reads explicitly) -- `cache_creation_input_tokens` (0 for now, unless Gemini exposes cache creation explicitly) -- `total_cost_usd` (derived from `input_tokens`/`output_tokens` and a simple pricing table keyed by model name). -- **Where to plug in**: Prefer to capture usage **at the point where the Gemini client is called**. If Stagehand exposes `usage_metadata` on its result, use that; otherwise, configure Stagehand's model to use a **wrapped Gemini client** that returns both the normal result and usage metadata for accounting. -- **Configuration**: Add an internal mapping of Gemini model names to per-token prices, so cost computation is centralized and easy to update. - -## Implementation steps - -### 1. Extend the usage tracker types and API - -1. Update `src/mcp/usage.ts`: - -- Introduce a `StagehandUsageMetrics` type with: -- `inputTokens`, `outputTokens`, `cacheReadInputTokens`, `cacheCreationInputTokens`, `totalCostUsd` (all numbers, defaulting to 0 when absent). -- Extend `StagehandOperationStats` to include aggregated fields: -- `inputTokens`, `outputTokens`, `cacheReadInputTokens`, `cacheCreationInputTokens`, `totalCostUsd`. -- Change `recordStagehandCall` to accept an optional `metrics?: StagehandUsageMetrics` argument and, when present, add those values to both the **global** and **per-session** aggregates. -- Keep the existing call-count and `toolCallCounts` logic intact. - -### 2. Add a Gemini pricing helper for cost computation - -1. Create a small helper (new module or inside `usage.ts`) that: - -- Defines a mapping from Gemini model names used in this project (e.g. `"gemini-2.0-flash"`, `"google/gemini-2.5-computer-use-preview-10-2025"`) to per-token prices for input and output tokens. -- Exposes a function `computeGeminiCostUsd({ modelName, inputTokens, outputTokens }): number` that multiplies token counts by the configured prices and returns `total_cost_usd`. - -2. This helper should be easy to update if pricing changes, but it should not make any external API calls at runtime. - -### 3. Determine how to access Gemini usage metadata through Stagehand - -1. Review Stagehand documentation / types (and, if available in the codebase, its usage patterns) to confirm whether it exposes Gemini `usage_metadata` on: - -- The return value of the operations we already call (`agent.execute`, `extract`, `observe`, etc.), or -- Some internal hook/callback or logging mechanism. - -2. If Stagehand surfaces `usage_metadata` directly on results (e.g. `result.usage_metadata` or similar): - -- For each Stagehand-using tool, extract `prompt_token_count` and `candidates_token_count` from the result. - -3. If Stagehand does **not** surface usage metadata, adapt the Stagehand configuration in `sessionManager.ts` or `createStagehandInstance` so that it uses a **wrapped Gemini client** that: - -- Calls the real Gemini client. -- Reads `response.usage_metadata`. -- Returns the normal content to Stagehand but also provides usage metadata in a way our tools can read (e.g. attached to the result, or via a shared metrics callback that calls `recordStagehandCall`). - -### 4. Feed Gemini metrics into recordStagehandCall - -1. For each Stagehand-backed tool (`agent.ts`, `extract.ts`, `observe.ts`, `act.ts`, `navigate.ts`, `screenshot.ts`, `url.ts`): - -- After the Gemini-driven Stagehand call completes and the result is available, derive a `StagehandUsageMetrics` object when **and only when** Gemini usage metadata is available. -- Map fields as: -- `inputTokens = usage_metadata.prompt_token_count` -- `outputTokens = usage_metadata.candidates_token_count` -- `cacheReadInputTokens = 0` (until caching semantics are exposed) -- `cacheCreationInputTokens = 0` -- `totalCostUsd = computeGeminiCostUsd({ modelName, inputTokens, outputTokens })` -- Call `recordStagehandCall({ sessionId, toolName, operation }, metrics)` instead of the current call-count-only signature. - -2. If usage metadata is not available for a particular call (e.g., non-Gemini model or legacy code path), call `recordStagehandCall` without the `metrics` argument so the system still tracks call counts. - -### 5. Adjust the usage stats tool output format - -1. Update `src/tools/usage.ts` to: - -- Return the new numeric fields (input/output tokens, cache\_\* tokens, total_cost_usd) as part of each operation’s JSON object. -- Optionally adjust the JSON keys to snake_case (`input_tokens`, `output_tokens`, etc.) to mirror Claude Agent SDK’s `message.usage` naming. - -2. Keep the existing `scope` (`global`/`perSession`/`all`), `sessionId`, and `reset` behavior unchanged so existing integrations continue to work. - -### 6. Documentation updates - -1. Update the README section for **Stagehand usage metrics** to: - -- Mention that, when using Gemini as the model, `browserbase_usage_stats` returns: -- `input_tokens`, `output_tokens`, `total_cost_usd`, and placeholder `cache_*` fields (currently zero unless Gemini exposes explicit cache usage). -- Clarify that these token counts come directly from Gemini’s `usage_metadata` and that cost is derived from them using a simple internal price table. - -2. Optionally add a short example snippet showing how a Claude Agent SDK-based agent can: - -- Call `browserbase_usage_stats` at the end of a run. -- Read `total_cost_usd` and token counts for reporting alongside Claude’s own `message.usage`. - -## Notes and future extensions - -- If Stagehand later adds first-class support for exposing underlying model usage/cost, we can simplify our wrapping logic and rely on that instead. -- If Gemini introduces direct cost reporting in responses, we can remove local pricing tables and use the API’s `total_cost_usd` directly, simplifying `computeGeminiCostUsd`. - -### To-dos - -- [ ] Create shared in-memory usage tracker in src/mcp/usage.ts with record/get/reset functions and appropriate types. -- [ ] Import and call the usage tracker from all Stagehand-using tools (agent, act, navigate, observe, extract, screenshot, url) to record each Stagehand operation with session and tool info. -- [ ] Add a new MCP tool browserbase_usage_stats that returns a snapshot of usage metrics via MCP call_tool, and register it in the tools index. -- [ ] Update README and any relevant Agent SDK integration examples to show how to call the usage stats tool and interpret its output.