From 076973655b473e6aebe421e460b189e0baf8cea8 Mon Sep 17 00:00:00 2001
From: Shrey Pandya <shrey@browserbase.com>
Date: Fri, 14 Nov 2025 14:21:50 -0800
Subject: [PATCH 1/2] print stagehand metrics

---
 .cursor/plans/stage-d93ad3e4.plan.md | 116 +++++++++++++++++++++++++++
 src/mcp/usage.ts                     | 115 ++++++++++++++++++++++++++
 src/tools/act.ts                     |   8 ++
 src/tools/agent.ts                   |   8 ++
 src/tools/extract.ts                 |   8 ++
 src/tools/index.ts                   |   3 +
 src/tools/navigate.ts                |   8 ++
 src/tools/observe.ts                 |   8 ++
 src/tools/url.ts                     |   8 ++
 src/tools/usage.ts                   | 102 +++++++++++++++++++++++
 10 files changed, 384 insertions(+)
 create mode 100644 .cursor/plans/stage-d93ad3e4.plan.md
 create mode 100644 src/mcp/usage.ts
 create mode 100644 src/tools/usage.ts

diff --git a/.cursor/plans/stage-d93ad3e4.plan.md b/.cursor/plans/stage-d93ad3e4.plan.md
new file mode 100644
index 0000000..51e3d90
--- /dev/null
+++ b/.cursor/plans/stage-d93ad3e4.plan.md
@@ -0,0 +1,116 @@
+<!-- d93ad3e4-1668-4333-8e39-d878faf06005 132f9ffa-0c5f-4fdc-a079-cc5c673f4d77 -->
+
+# Plan: Integrate Gemini usage metadata into Stagehand MCP usage stats
+
+## Overview
+
+We will update the existing Stagehand usage tracking in the Browserbase MCP server so that, when Gemini is the backing model, it records **exact token usage** from Gemini's `usage_metadata` instead of local estimates. These richer metrics will be surfaced via the existing `browserbase_usage_stats` tool in a shape similar to Claude Agent SDK's `message.usage` (tokens + cost).
+
+## Key facts from Gemini API
+
+- Gemini responses expose **usage metadata** on each call, e.g. `usage_metadata` / `usageMetadata` with:
+- `prompt_token_count` / `promptTokenCount`
+- `candidates_token_count` / `candidatesTokenCount`
+- `total_token_count` / `totalTokenCount`
+- We should rely on this **post-request usage metadata** for accurate accounting rather than local token counting.
+- Cost is not returned directly by Gemini but can be derived from published pricing using the token counts; we will compute `total_cost_usd` from those counts and a configured price table for the Gemini model in use.
+
+## Design decisions
+
+- **Scope**: Only track detailed token metrics when the underlying LLM is **Gemini via the official API**; for other models we keep the current call-count-only behavior.
+- **Metrics shape**: For each operation we will aggregate fields that mirror Claude Agent SDK usage:
+- `input_tokens` (mapped from Gemini `prompt_token_count`)
+- `output_tokens` (mapped from Gemini `candidates_token_count`)
+- `cache_read_input_tokens` (0 for now, unless Gemini exposes cache reads explicitly)
+- `cache_creation_input_tokens` (0 for now, unless Gemini exposes cache creation explicitly)
+- `total_cost_usd` (derived from `input_tokens`/`output_tokens` and a simple pricing table keyed by model name).
+- **Where to plug in**: Prefer to capture usage **at the point where the Gemini client is called**. If Stagehand exposes `usage_metadata` on its result, use that; otherwise, configure Stagehand's model to use a **wrapped Gemini client** that returns both the normal result and usage metadata for accounting.
+- **Configuration**: Add an internal mapping of Gemini model names to per-token prices, so cost computation is centralized and easy to update.
+
+## Implementation steps
+
+### 1. Extend the usage tracker types and API
+
+1. Update `src/mcp/usage.ts`:
+
+- Introduce a `StagehandUsageMetrics` type with:
+- `inputTokens`, `outputTokens`, `cacheReadInputTokens`, `cacheCreationInputTokens`, `totalCostUsd` (all numbers, defaulting to 0 when absent).
+- Extend `StagehandOperationStats` to include aggregated fields:
+- `inputTokens`, `outputTokens`, `cacheReadInputTokens`, `cacheCreationInputTokens`, `totalCostUsd`.
+- Change `recordStagehandCall` to accept an optional `metrics?: StagehandUsageMetrics` argument and, when present, add those values to both the **global** and **per-session** aggregates.
+- Keep the existing call-count and `toolCallCounts` logic intact.
+
+### 2. Add a Gemini pricing helper for cost computation
+
+1. Create a small helper (new module or inside `usage.ts`) that:
+
+- Defines a mapping from Gemini model names used in this project (e.g. `"gemini-2.0-flash"`, `"google/gemini-2.5-computer-use-preview-10-2025"`) to per-token prices for input and output tokens.
+- Exposes a function `computeGeminiCostUsd({ modelName, inputTokens, outputTokens }): number` that multiplies token counts by the configured prices and returns `total_cost_usd`.
+
+2. This helper should be easy to update if pricing changes, but it should not make any external API calls at runtime.
+
+### 3. Determine how to access Gemini usage metadata through Stagehand
+
+1. Review Stagehand documentation / types (and, if available in the codebase, its usage patterns) to confirm whether it exposes Gemini `usage_metadata` on:
+
+- The return value of the operations we already call (`agent.execute`, `extract`, `observe`, etc.), or
+- Some internal hook/callback or logging mechanism.
+
+2. If Stagehand surfaces `usage_metadata` directly on results (e.g. `result.usage_metadata` or similar):
+
+- For each Stagehand-using tool, extract `prompt_token_count` and `candidates_token_count` from the result.
+
+3. If Stagehand does **not** surface usage metadata, adapt the Stagehand configuration in `sessionManager.ts` or `createStagehandInstance` so that it uses a **wrapped Gemini client** that:
+
+- Calls the real Gemini client.
+- Reads `response.usage_metadata`.
+- Returns the normal content to Stagehand but also provides usage metadata in a way our tools can read (e.g. attached to the result, or via a shared metrics callback that calls `recordStagehandCall`).
+
+### 4. Feed Gemini metrics into recordStagehandCall
+
+1. For each Stagehand-backed tool (`agent.ts`, `extract.ts`, `observe.ts`, `act.ts`, `navigate.ts`, `screenshot.ts`, `url.ts`):
+
+- After the Gemini-driven Stagehand call completes and the result is available, derive a `StagehandUsageMetrics` object when **and only when** Gemini usage metadata is available.
+- Map fields as:
+- `inputTokens = usage_metadata.prompt_token_count`
+- `outputTokens = usage_metadata.candidates_token_count`
+- `cacheReadInputTokens = 0` (until caching semantics are exposed)
+- `cacheCreationInputTokens = 0`
+- `totalCostUsd = computeGeminiCostUsd({ modelName, inputTokens, outputTokens })`
+- Call `recordStagehandCall({ sessionId, toolName, operation }, metrics)` instead of the current call-count-only signature.
+
+2. If usage metadata is not available for a particular call (e.g., non-Gemini model or legacy code path), call `recordStagehandCall` without the `metrics` argument so the system still tracks call counts.
+
+### 5. Adjust the usage stats tool output format
+
+1. Update `src/tools/usage.ts` to:
+
+- Return the new numeric fields (input/output tokens, cache\_\* tokens, total_cost_usd) as part of each operation’s JSON object.
+- Optionally adjust the JSON keys to snake_case (`input_tokens`, `output_tokens`, etc.) to mirror Claude Agent SDK’s `message.usage` naming.
+
+2. Keep the existing `scope` (`global`/`perSession`/`all`), `sessionId`, and `reset` behavior unchanged so existing integrations continue to work.
+
+### 6. Documentation updates
+
+1. Update the README section for **Stagehand usage metrics** to:
+
+- Mention that, when using Gemini as the model, `browserbase_usage_stats` returns:
+- `input_tokens`, `output_tokens`, `total_cost_usd`, and placeholder `cache_*` fields (currently zero unless Gemini exposes explicit cache usage).
+- Clarify that these token counts come directly from Gemini’s `usage_metadata` and that cost is derived from them using a simple internal price table.
+
+2. Optionally add a short example snippet showing how a Claude Agent SDK-based agent can:
+
+- Call `browserbase_usage_stats` at the end of a run.
+- Read `total_cost_usd` and token counts for reporting alongside Claude’s own `message.usage`.
+
+## Notes and future extensions
+
+- If Stagehand later adds first-class support for exposing underlying model usage/cost, we can simplify our wrapping logic and rely on that instead.
+- If Gemini introduces direct cost reporting in responses, we can remove local pricing tables and use the API’s `total_cost_usd` directly, simplifying `computeGeminiCostUsd`.
+
+### To-dos
+
+- [ ] Create shared in-memory usage tracker in src/mcp/usage.ts with record/get/reset functions and appropriate types.
+- [ ] Import and call the usage tracker from all Stagehand-using tools (agent, act, navigate, observe, extract, screenshot, url) to record each Stagehand operation with session and tool info.
+- [ ] Add a new MCP tool browserbase_usage_stats that returns a snapshot of usage metrics via MCP call_tool, and register it in the tools index.
+- [ ] Update README and any relevant Agent SDK integration examples to show how to call the usage stats tool and interpret its output.
diff --git a/src/mcp/usage.ts b/src/mcp/usage.ts
new file mode 100644
index 0000000..9b2b9bf
--- /dev/null
+++ b/src/mcp/usage.ts
@@ -0,0 +1,115 @@
+import type { Stagehand } from "@browserbasehq/stagehand";
+
+export type StagehandUsageOperation = string;
+
+export type StagehandUsageKey = {
+  sessionId: string;
+  toolName: string;
+  operation: StagehandUsageOperation;
+};
+
+export type StagehandOperationStats = {
+  callCount: number;
+  toolCallCounts: Record<string, number>;
+};
+
+export type StagehandSessionUsage = {
+  operations: Record<StagehandUsageOperation, StagehandOperationStats>;
+};
+
+export type StagehandUsageSnapshot = {
+  global: Record<StagehandUsageOperation, StagehandOperationStats>;
+  perSession: Record<string, StagehandSessionUsage>;
+};
+
+const globalUsage: Record<string, StagehandOperationStats> = {};
+const perSessionUsage: Record<string, StagehandSessionUsage> = {};
+
+function getOrCreateOperationStats(
+  container: Record<StagehandUsageOperation, StagehandOperationStats>,
+  operation: StagehandUsageOperation,
+): StagehandOperationStats {
+  if (!container[operation]) {
+    container[operation] = {
+      callCount: 0,
+      toolCallCounts: {},
+    };
+  }
+  return container[operation];
+}
+
+async function logStagehandMetrics(
+  stagehand: Stagehand | undefined,
+  key: StagehandUsageKey,
+): Promise<void> {
+  if (!stagehand) return;
+
+   
+  const rawMetrics: any = (stagehand as any).metrics;
+  const metrics =
+    rawMetrics && typeof rawMetrics.then === "function"
+      ? await rawMetrics
+      : rawMetrics;
+
+  if (!metrics) return;
+
+  // Keep this as a structured JSON line so it’s easy to grep/pipe elsewhere.
+   
+  console.log(
+    JSON.stringify(
+      {
+        source: "stagehand-mcp",
+        event: "stagehand_metrics",
+        ...key,
+        metrics,
+      },
+      null,
+      2,
+    ),
+  );
+}
+
+export async function recordStagehandCall(
+  args: StagehandUsageKey & { stagehand?: Stagehand },
+): Promise<void> {
+  const { sessionId, toolName, operation, stagehand } = args;
+
+  // Update global aggregate
+  const globalStats = getOrCreateOperationStats(globalUsage, operation);
+  globalStats.callCount += 1;
+  globalStats.toolCallCounts[toolName] =
+    (globalStats.toolCallCounts[toolName] ?? 0) + 1;
+
+  // Update per-session usage
+  if (!perSessionUsage[sessionId]) {
+    perSessionUsage[sessionId] = { operations: {} };
+  }
+
+  const sessionStats = getOrCreateOperationStats(
+    perSessionUsage[sessionId].operations,
+    operation,
+  );
+  sessionStats.callCount += 1;
+  sessionStats.toolCallCounts[toolName] =
+    (sessionStats.toolCallCounts[toolName] ?? 0) + 1;
+
+  await logStagehandMetrics(stagehand, { sessionId, toolName, operation });
+}
+
+export function getUsageSnapshot(): StagehandUsageSnapshot {
+  return {
+    global: globalUsage,
+    perSession: perSessionUsage,
+  };
+}
+
+export function resetUsage(): void {
+  for (const key of Object.keys(globalUsage)) {
+     
+    delete globalUsage[key];
+  }
+  for (const key of Object.keys(perSessionUsage)) {
+     
+    delete perSessionUsage[key];
+  }
+}
diff --git a/src/tools/act.ts b/src/tools/act.ts
index d395a66..2d22102 100644
--- a/src/tools/act.ts
+++ b/src/tools/act.ts
@@ -2,6 +2,7 @@ import { z } from "zod";
 import type { Tool, ToolSchema, ToolResult } from "./tool.js";
 import type { Context } from "../context.js";
 import type { ToolActionResult } from "../types/types.js";
+import { recordStagehandCall } from "../mcp/usage.js";
 
 /**
  * Stagehand Act
@@ -45,6 +46,13 @@ async function handleAct(
         variables: params.variables,
       });
 
+      await recordStagehandCall({
+        sessionId: context.currentSessionId,
+        toolName: actSchema.name,
+        operation: "act",
+        stagehand,
+      });
+
       return {
         content: [
           {
diff --git a/src/tools/agent.ts b/src/tools/agent.ts
index e333079..2a62313 100644
--- a/src/tools/agent.ts
+++ b/src/tools/agent.ts
@@ -2,6 +2,7 @@ import { z } from "zod";
 import type { Tool, ToolSchema, ToolResult } from "./tool.js";
 import type { Context } from "../context.js";
 import type { ToolActionResult } from "../types/types.js";
+import { recordStagehandCall } from "../mcp/usage.js";
 
 /**
  * Stagehand Agent
@@ -54,6 +55,13 @@ async function handleAgent(
         maxSteps: 20,
       });
 
+      await recordStagehandCall({
+        sessionId: context.currentSessionId,
+        toolName: agentSchema.name,
+        operation: "agent.execute",
+        stagehand,
+      });
+
       return {
         content: [
           {
diff --git a/src/tools/extract.ts b/src/tools/extract.ts
index 7bfb56b..2c37942 100644
--- a/src/tools/extract.ts
+++ b/src/tools/extract.ts
@@ -2,6 +2,7 @@ import { z } from "zod";
 import type { Tool, ToolSchema, ToolResult } from "./tool.js";
 import type { Context } from "../context.js";
 import type { ToolActionResult } from "../types/types.js";
+import { recordStagehandCall } from "../mcp/usage.js";
 
 /**
  * Stagehand Extract
@@ -39,6 +40,13 @@ async function handleExtract(
 
       const extraction = await stagehand.extract(params.instruction);
 
+      await recordStagehandCall({
+        sessionId: context.currentSessionId,
+        toolName: extractSchema.name,
+        operation: "extract",
+        stagehand,
+      });
+
       return {
         content: [
           {
diff --git a/src/tools/index.ts b/src/tools/index.ts
index f0a19da..5ed63f6 100644
--- a/src/tools/index.ts
+++ b/src/tools/index.ts
@@ -6,6 +6,7 @@ import screenshotTool from "./screenshot.js";
 import sessionTools from "./session.js";
 import getUrlTool from "./url.js";
 import agentTool from "./agent.js";
+import usageTool from "./usage.js";
 
 // Export individual tools
 export { default as navigateTool } from "./navigate.js";
@@ -16,6 +17,7 @@ export { default as screenshotTool } from "./screenshot.js";
 export { default as sessionTools } from "./session.js";
 export { default as getUrlTool } from "./url.js";
 export { default as agentTool } from "./agent.js";
+export { default as usageTool } from "./usage.js";
 
 // Export all tools as array
 export const TOOLS = [
@@ -27,6 +29,7 @@ export const TOOLS = [
   screenshotTool,
   getUrlTool,
   agentTool,
+  usageTool,
 ];
 
 export const sessionManagementTools = sessionTools;
diff --git a/src/tools/navigate.ts b/src/tools/navigate.ts
index c6309a0..11ece5f 100644
--- a/src/tools/navigate.ts
+++ b/src/tools/navigate.ts
@@ -2,6 +2,7 @@ import { z } from "zod";
 import type { Tool, ToolSchema, ToolResult } from "./tool.js";
 import type { Context } from "../context.js";
 import type { ToolActionResult } from "../types/types.js";
+import { recordStagehandCall } from "../mcp/usage.js";
 
 const NavigateInputSchema = z.object({
   url: z.string().describe("The URL to navigate to"),
@@ -37,6 +38,13 @@ async function handleNavigate(
         throw new Error("No Browserbase session ID available");
       }
 
+      await recordStagehandCall({
+        sessionId: context.currentSessionId,
+        toolName: navigateSchema.name,
+        operation: "navigate.goto",
+        stagehand,
+      });
+
       return {
         content: [
           {
diff --git a/src/tools/observe.ts b/src/tools/observe.ts
index b473300..9905b73 100644
--- a/src/tools/observe.ts
+++ b/src/tools/observe.ts
@@ -2,6 +2,7 @@ import { z } from "zod";
 import type { Tool, ToolSchema, ToolResult } from "./tool.js";
 import type { Context } from "../context.js";
 import type { ToolActionResult } from "../types/types.js";
+import { recordStagehandCall } from "../mcp/usage.js";
 
 /**
  * Stagehand Observe
@@ -42,6 +43,13 @@ async function handleObserve(
 
       const observations = await stagehand.observe(params.instruction);
 
+      await recordStagehandCall({
+        sessionId: context.currentSessionId,
+        toolName: observeSchema.name,
+        operation: "observe",
+        stagehand,
+      });
+
       return {
         content: [
           {
diff --git a/src/tools/url.ts b/src/tools/url.ts
index da7e124..f73a296 100644
--- a/src/tools/url.ts
+++ b/src/tools/url.ts
@@ -2,6 +2,7 @@ import { z } from "zod";
 import type { Tool, ToolSchema, ToolResult } from "./tool.js";
 import type { Context } from "../context.js";
 import type { ToolActionResult } from "../types/types.js";
+import { recordStagehandCall } from "../mcp/usage.js";
 
 /**
  * Stagehand Get URL
@@ -37,6 +38,13 @@ async function handleGetUrl(
 
       const currentUrl = page.url();
 
+      await recordStagehandCall({
+        sessionId: context.currentSessionId,
+        toolName: getUrlSchema.name,
+        operation: "get_url",
+        stagehand,
+      });
+
       return {
         content: [
           {
diff --git a/src/tools/usage.ts b/src/tools/usage.ts
new file mode 100644
index 0000000..0f6e37b
--- /dev/null
+++ b/src/tools/usage.ts
@@ -0,0 +1,102 @@
+import { z } from "zod";
+import type { Tool, ToolSchema, ToolResult } from "./tool.js";
+import type { Context } from "../context.js";
+import type { ToolActionResult } from "../types/types.js";
+import { getUsageSnapshot, resetUsage } from "../mcp/usage.js";
+
+const UsageInputSchema = z
+  .object({
+    sessionId: z
+      .string()
+      .optional()
+      .describe(
+        "Optional: filter per-session stats to a specific internal MCP session ID.",
+      ),
+    scope: z
+      .enum(["global", "perSession", "all"])
+      .optional()
+      .describe(
+        'Optional: which portion of the snapshot to return: "global", "perSession", or "all" (default).',
+      ),
+    reset: z
+      .boolean()
+      .optional()
+      .describe(
+        "Optional: when true, reset accumulated usage counters after returning the snapshot.",
+      ),
+  })
+  .optional()
+  .default({});
+
+type UsageInput = z.infer<typeof UsageInputSchema>;
+
+const usageSchema: ToolSchema<typeof UsageInputSchema> = {
+  name: "browserbase_usage_stats",
+  description:
+    "Return a snapshot of Stagehand usage metrics (call counts) for this MCP process, optionally filtered by session.",
+  inputSchema: UsageInputSchema,
+};
+
+async function handleUsage(
+   
+  context: Context,
+  params: UsageInput,
+): Promise<ToolResult> {
+  const action = async (): Promise<ToolActionResult> => {
+    const snapshot = getUsageSnapshot();
+
+    const scope = params.scope ?? "all";
+    let result: unknown = snapshot;
+
+    if (scope === "global") {
+      result = { global: snapshot.global };
+    } else if (scope === "perSession") {
+      if (params.sessionId) {
+        result = {
+          perSession: {
+            [params.sessionId]: snapshot.perSession[params.sessionId] ?? {
+              operations: {},
+            },
+          },
+        };
+      } else {
+        result = { perSession: snapshot.perSession };
+      }
+    } else if (scope === "all" && params.sessionId) {
+      result = {
+        global: snapshot.global,
+        perSession: {
+          [params.sessionId]: snapshot.perSession[params.sessionId] ?? {
+            operations: {},
+          },
+        },
+      };
+    }
+
+    if (params.reset) {
+      resetUsage();
+    }
+
+    return {
+      content: [
+        {
+          type: "text",
+          text: JSON.stringify(result, null, 2),
+        },
+      ],
+    };
+  };
+
+  return {
+    action,
+    waitForNetwork: false,
+  };
+}
+
+const usageTool: Tool<typeof UsageInputSchema> = {
+  capability: "core",
+  schema: usageSchema,
+  handle: handleUsage,
+};
+
+export default usageTool;

From bafd75bae8b3523cc91c6e476839dde86ab6b025 Mon Sep 17 00:00:00 2001
From: Shrey Pandya <shrey@browserbase.com>
Date: Fri, 14 Nov 2025 14:22:31 -0800
Subject: [PATCH 2/2] remove cursor plan

---
 .cursor/plans/stage-d93ad3e4.plan.md | 116 ---------------------------
 1 file changed, 116 deletions(-)
 delete mode 100644 .cursor/plans/stage-d93ad3e4.plan.md

diff --git a/.cursor/plans/stage-d93ad3e4.plan.md b/.cursor/plans/stage-d93ad3e4.plan.md
deleted file mode 100644
index 51e3d90..0000000
--- a/.cursor/plans/stage-d93ad3e4.plan.md
+++ /dev/null
@@ -1,116 +0,0 @@
-<!-- d93ad3e4-1668-4333-8e39-d878faf06005 132f9ffa-0c5f-4fdc-a079-cc5c673f4d77 -->
-
-# Plan: Integrate Gemini usage metadata into Stagehand MCP usage stats
-
-## Overview
-
-We will update the existing Stagehand usage tracking in the Browserbase MCP server so that, when Gemini is the backing model, it records **exact token usage** from Gemini's `usage_metadata` instead of local estimates. These richer metrics will be surfaced via the existing `browserbase_usage_stats` tool in a shape similar to Claude Agent SDK's `message.usage` (tokens + cost).
-
-## Key facts from Gemini API
-
-- Gemini responses expose **usage metadata** on each call, e.g. `usage_metadata` / `usageMetadata` with:
-- `prompt_token_count` / `promptTokenCount`
-- `candidates_token_count` / `candidatesTokenCount`
-- `total_token_count` / `totalTokenCount`
-- We should rely on this **post-request usage metadata** for accurate accounting rather than local token counting.
-- Cost is not returned directly by Gemini but can be derived from published pricing using the token counts; we will compute `total_cost_usd` from those counts and a configured price table for the Gemini model in use.
-
-## Design decisions
-
-- **Scope**: Only track detailed token metrics when the underlying LLM is **Gemini via the official API**; for other models we keep the current call-count-only behavior.
-- **Metrics shape**: For each operation we will aggregate fields that mirror Claude Agent SDK usage:
-- `input_tokens` (mapped from Gemini `prompt_token_count`)
-- `output_tokens` (mapped from Gemini `candidates_token_count`)
-- `cache_read_input_tokens` (0 for now, unless Gemini exposes cache reads explicitly)
-- `cache_creation_input_tokens` (0 for now, unless Gemini exposes cache creation explicitly)
-- `total_cost_usd` (derived from `input_tokens`/`output_tokens` and a simple pricing table keyed by model name).
-- **Where to plug in**: Prefer to capture usage **at the point where the Gemini client is called**. If Stagehand exposes `usage_metadata` on its result, use that; otherwise, configure Stagehand's model to use a **wrapped Gemini client** that returns both the normal result and usage metadata for accounting.
-- **Configuration**: Add an internal mapping of Gemini model names to per-token prices, so cost computation is centralized and easy to update.
-
-## Implementation steps
-
-### 1. Extend the usage tracker types and API
-
-1. Update `src/mcp/usage.ts`:
-
-- Introduce a `StagehandUsageMetrics` type with:
-- `inputTokens`, `outputTokens`, `cacheReadInputTokens`, `cacheCreationInputTokens`, `totalCostUsd` (all numbers, defaulting to 0 when absent).
-- Extend `StagehandOperationStats` to include aggregated fields:
-- `inputTokens`, `outputTokens`, `cacheReadInputTokens`, `cacheCreationInputTokens`, `totalCostUsd`.
-- Change `recordStagehandCall` to accept an optional `metrics?: StagehandUsageMetrics` argument and, when present, add those values to both the **global** and **per-session** aggregates.
-- Keep the existing call-count and `toolCallCounts` logic intact.
-
-### 2. Add a Gemini pricing helper for cost computation
-
-1. Create a small helper (new module or inside `usage.ts`) that:
-
-- Defines a mapping from Gemini model names used in this project (e.g. `"gemini-2.0-flash"`, `"google/gemini-2.5-computer-use-preview-10-2025"`) to per-token prices for input and output tokens.
-- Exposes a function `computeGeminiCostUsd({ modelName, inputTokens, outputTokens }): number` that multiplies token counts by the configured prices and returns `total_cost_usd`.
-
-2. This helper should be easy to update if pricing changes, but it should not make any external API calls at runtime.
-
-### 3. Determine how to access Gemini usage metadata through Stagehand
-
-1. Review Stagehand documentation / types (and, if available in the codebase, its usage patterns) to confirm whether it exposes Gemini `usage_metadata` on:
-
-- The return value of the operations we already call (`agent.execute`, `extract`, `observe`, etc.), or
-- Some internal hook/callback or logging mechanism.
-
-2. If Stagehand surfaces `usage_metadata` directly on results (e.g. `result.usage_metadata` or similar):
-
-- For each Stagehand-using tool, extract `prompt_token_count` and `candidates_token_count` from the result.
-
-3. If Stagehand does **not** surface usage metadata, adapt the Stagehand configuration in `sessionManager.ts` or `createStagehandInstance` so that it uses a **wrapped Gemini client** that:
-
-- Calls the real Gemini client.
-- Reads `response.usage_metadata`.
-- Returns the normal content to Stagehand but also provides usage metadata in a way our tools can read (e.g. attached to the result, or via a shared metrics callback that calls `recordStagehandCall`).
-
-### 4. Feed Gemini metrics into recordStagehandCall
-
-1. For each Stagehand-backed tool (`agent.ts`, `extract.ts`, `observe.ts`, `act.ts`, `navigate.ts`, `screenshot.ts`, `url.ts`):
-
-- After the Gemini-driven Stagehand call completes and the result is available, derive a `StagehandUsageMetrics` object when **and only when** Gemini usage metadata is available.
-- Map fields as:
-- `inputTokens = usage_metadata.prompt_token_count`
-- `outputTokens = usage_metadata.candidates_token_count`
-- `cacheReadInputTokens = 0` (until caching semantics are exposed)
-- `cacheCreationInputTokens = 0`
-- `totalCostUsd = computeGeminiCostUsd({ modelName, inputTokens, outputTokens })`
-- Call `recordStagehandCall({ sessionId, toolName, operation }, metrics)` instead of the current call-count-only signature.
-
-2. If usage metadata is not available for a particular call (e.g., non-Gemini model or legacy code path), call `recordStagehandCall` without the `metrics` argument so the system still tracks call counts.
-
-### 5. Adjust the usage stats tool output format
-
-1. Update `src/tools/usage.ts` to:
-
-- Return the new numeric fields (input/output tokens, cache\_\* tokens, total_cost_usd) as part of each operation’s JSON object.
-- Optionally adjust the JSON keys to snake_case (`input_tokens`, `output_tokens`, etc.) to mirror Claude Agent SDK’s `message.usage` naming.
-
-2. Keep the existing `scope` (`global`/`perSession`/`all`), `sessionId`, and `reset` behavior unchanged so existing integrations continue to work.
-
-### 6. Documentation updates
-
-1. Update the README section for **Stagehand usage metrics** to:
-
-- Mention that, when using Gemini as the model, `browserbase_usage_stats` returns:
-- `input_tokens`, `output_tokens`, `total_cost_usd`, and placeholder `cache_*` fields (currently zero unless Gemini exposes explicit cache usage).
-- Clarify that these token counts come directly from Gemini’s `usage_metadata` and that cost is derived from them using a simple internal price table.
-
-2. Optionally add a short example snippet showing how a Claude Agent SDK-based agent can:
-
-- Call `browserbase_usage_stats` at the end of a run.
-- Read `total_cost_usd` and token counts for reporting alongside Claude’s own `message.usage`.
-
-## Notes and future extensions
-
-- If Stagehand later adds first-class support for exposing underlying model usage/cost, we can simplify our wrapping logic and rely on that instead.
-- If Gemini introduces direct cost reporting in responses, we can remove local pricing tables and use the API’s `total_cost_usd` directly, simplifying `computeGeminiCostUsd`.
-
-### To-dos
-
-- [ ] Create shared in-memory usage tracker in src/mcp/usage.ts with record/get/reset functions and appropriate types.
-- [ ] Import and call the usage tracker from all Stagehand-using tools (agent, act, navigate, observe, extract, screenshot, url) to record each Stagehand operation with session and tool info.
-- [ ] Add a new MCP tool browserbase_usage_stats that returns a snapshot of usage metrics via MCP call_tool, and register it in the tools index.
-- [ ] Update README and any relevant Agent SDK integration examples to show how to call the usage stats tool and interpret its output.