Skip to content

Commit 0769736

Browse files
committed
print stagehand metrics
1 parent b4d1d16 commit 0769736

File tree

10 files changed

+384
-0
lines changed

10 files changed

+384
-0
lines changed
Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
<!-- d93ad3e4-1668-4333-8e39-d878faf06005 132f9ffa-0c5f-4fdc-a079-cc5c673f4d77 -->
2+
3+
# Plan: Integrate Gemini usage metadata into Stagehand MCP usage stats
4+
5+
## Overview
6+
7+
We will update the existing Stagehand usage tracking in the Browserbase MCP server so that, when Gemini is the backing model, it records **exact token usage** from Gemini's `usage_metadata` instead of local estimates. These richer metrics will be surfaced via the existing `browserbase_usage_stats` tool in a shape similar to Claude Agent SDK's `message.usage` (tokens + cost).
8+
9+
## Key facts from Gemini API
10+
11+
- Gemini responses expose **usage metadata** on each call, e.g. `usage_metadata` / `usageMetadata` with:
12+
- `prompt_token_count` / `promptTokenCount`
13+
- `candidates_token_count` / `candidatesTokenCount`
14+
- `total_token_count` / `totalTokenCount`
15+
- We should rely on this **post-request usage metadata** for accurate accounting rather than local token counting.
16+
- Cost is not returned directly by Gemini but can be derived from published pricing using the token counts; we will compute `total_cost_usd` from those counts and a configured price table for the Gemini model in use.
17+
18+
## Design decisions
19+
20+
- **Scope**: Only track detailed token metrics when the underlying LLM is **Gemini via the official API**; for other models we keep the current call-count-only behavior.
21+
- **Metrics shape**: For each operation we will aggregate fields that mirror Claude Agent SDK usage:
22+
- `input_tokens` (mapped from Gemini `prompt_token_count`)
23+
- `output_tokens` (mapped from Gemini `candidates_token_count`)
24+
- `cache_read_input_tokens` (0 for now, unless Gemini exposes cache reads explicitly)
25+
- `cache_creation_input_tokens` (0 for now, unless Gemini exposes cache creation explicitly)
26+
- `total_cost_usd` (derived from `input_tokens`/`output_tokens` and a simple pricing table keyed by model name).
27+
- **Where to plug in**: Prefer to capture usage **at the point where the Gemini client is called**. If Stagehand exposes `usage_metadata` on its result, use that; otherwise, configure Stagehand's model to use a **wrapped Gemini client** that returns both the normal result and usage metadata for accounting.
28+
- **Configuration**: Add an internal mapping of Gemini model names to per-token prices, so cost computation is centralized and easy to update.
29+
30+
## Implementation steps
31+
32+
### 1. Extend the usage tracker types and API
33+
34+
1. Update `src/mcp/usage.ts`:
35+
36+
- Introduce a `StagehandUsageMetrics` type with:
37+
- `inputTokens`, `outputTokens`, `cacheReadInputTokens`, `cacheCreationInputTokens`, `totalCostUsd` (all numbers, defaulting to 0 when absent).
38+
- Extend `StagehandOperationStats` to include aggregated fields:
39+
- `inputTokens`, `outputTokens`, `cacheReadInputTokens`, `cacheCreationInputTokens`, `totalCostUsd`.
40+
- Change `recordStagehandCall` to accept an optional `metrics?: StagehandUsageMetrics` argument and, when present, add those values to both the **global** and **per-session** aggregates.
41+
- Keep the existing call-count and `toolCallCounts` logic intact.
42+
43+
### 2. Add a Gemini pricing helper for cost computation
44+
45+
1. Create a small helper (new module or inside `usage.ts`) that:
46+
47+
- Defines a mapping from Gemini model names used in this project (e.g. `"gemini-2.0-flash"`, `"google/gemini-2.5-computer-use-preview-10-2025"`) to per-token prices for input and output tokens.
48+
- Exposes a function `computeGeminiCostUsd({ modelName, inputTokens, outputTokens }): number` that multiplies token counts by the configured prices and returns `total_cost_usd`.
49+
50+
2. This helper should be easy to update if pricing changes, but it should not make any external API calls at runtime.
51+
52+
### 3. Determine how to access Gemini usage metadata through Stagehand
53+
54+
1. Review Stagehand documentation / types (and, if available in the codebase, its usage patterns) to confirm whether it exposes Gemini `usage_metadata` on:
55+
56+
- The return value of the operations we already call (`agent.execute`, `extract`, `observe`, etc.), or
57+
- Some internal hook/callback or logging mechanism.
58+
59+
2. If Stagehand surfaces `usage_metadata` directly on results (e.g. `result.usage_metadata` or similar):
60+
61+
- For each Stagehand-using tool, extract `prompt_token_count` and `candidates_token_count` from the result.
62+
63+
3. If Stagehand does **not** surface usage metadata, adapt the Stagehand configuration in `sessionManager.ts` or `createStagehandInstance` so that it uses a **wrapped Gemini client** that:
64+
65+
- Calls the real Gemini client.
66+
- Reads `response.usage_metadata`.
67+
- Returns the normal content to Stagehand but also provides usage metadata in a way our tools can read (e.g. attached to the result, or via a shared metrics callback that calls `recordStagehandCall`).
68+
69+
### 4. Feed Gemini metrics into recordStagehandCall
70+
71+
1. For each Stagehand-backed tool (`agent.ts`, `extract.ts`, `observe.ts`, `act.ts`, `navigate.ts`, `screenshot.ts`, `url.ts`):
72+
73+
- After the Gemini-driven Stagehand call completes and the result is available, derive a `StagehandUsageMetrics` object when **and only when** Gemini usage metadata is available.
74+
- Map fields as:
75+
- `inputTokens = usage_metadata.prompt_token_count`
76+
- `outputTokens = usage_metadata.candidates_token_count`
77+
- `cacheReadInputTokens = 0` (until caching semantics are exposed)
78+
- `cacheCreationInputTokens = 0`
79+
- `totalCostUsd = computeGeminiCostUsd({ modelName, inputTokens, outputTokens })`
80+
- Call `recordStagehandCall({ sessionId, toolName, operation }, metrics)` instead of the current call-count-only signature.
81+
82+
2. If usage metadata is not available for a particular call (e.g., non-Gemini model or legacy code path), call `recordStagehandCall` without the `metrics` argument so the system still tracks call counts.
83+
84+
### 5. Adjust the usage stats tool output format
85+
86+
1. Update `src/tools/usage.ts` to:
87+
88+
- Return the new numeric fields (input/output tokens, cache\_\* tokens, total_cost_usd) as part of each operation’s JSON object.
89+
- Optionally adjust the JSON keys to snake_case (`input_tokens`, `output_tokens`, etc.) to mirror Claude Agent SDK’s `message.usage` naming.
90+
91+
2. Keep the existing `scope` (`global`/`perSession`/`all`), `sessionId`, and `reset` behavior unchanged so existing integrations continue to work.
92+
93+
### 6. Documentation updates
94+
95+
1. Update the README section for **Stagehand usage metrics** to:
96+
97+
- Mention that, when using Gemini as the model, `browserbase_usage_stats` returns:
98+
- `input_tokens`, `output_tokens`, `total_cost_usd`, and placeholder `cache_*` fields (currently zero unless Gemini exposes explicit cache usage).
99+
- Clarify that these token counts come directly from Gemini’s `usage_metadata` and that cost is derived from them using a simple internal price table.
100+
101+
2. Optionally add a short example snippet showing how a Claude Agent SDK-based agent can:
102+
103+
- Call `browserbase_usage_stats` at the end of a run.
104+
- Read `total_cost_usd` and token counts for reporting alongside Claude’s own `message.usage`.
105+
106+
## Notes and future extensions
107+
108+
- If Stagehand later adds first-class support for exposing underlying model usage/cost, we can simplify our wrapping logic and rely on that instead.
109+
- If Gemini introduces direct cost reporting in responses, we can remove local pricing tables and use the API’s `total_cost_usd` directly, simplifying `computeGeminiCostUsd`.
110+
111+
### To-dos
112+
113+
- [ ] Create shared in-memory usage tracker in src/mcp/usage.ts with record/get/reset functions and appropriate types.
114+
- [ ] Import and call the usage tracker from all Stagehand-using tools (agent, act, navigate, observe, extract, screenshot, url) to record each Stagehand operation with session and tool info.
115+
- [ ] Add a new MCP tool browserbase_usage_stats that returns a snapshot of usage metrics via MCP call_tool, and register it in the tools index.
116+
- [ ] Update README and any relevant Agent SDK integration examples to show how to call the usage stats tool and interpret its output.

src/mcp/usage.ts

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
import type { Stagehand } from "@browserbasehq/stagehand";
2+
3+
export type StagehandUsageOperation = string;
4+
5+
export type StagehandUsageKey = {
6+
sessionId: string;
7+
toolName: string;
8+
operation: StagehandUsageOperation;
9+
};
10+
11+
export type StagehandOperationStats = {
12+
callCount: number;
13+
toolCallCounts: Record<string, number>;
14+
};
15+
16+
export type StagehandSessionUsage = {
17+
operations: Record<StagehandUsageOperation, StagehandOperationStats>;
18+
};
19+
20+
export type StagehandUsageSnapshot = {
21+
global: Record<StagehandUsageOperation, StagehandOperationStats>;
22+
perSession: Record<string, StagehandSessionUsage>;
23+
};
24+
25+
const globalUsage: Record<string, StagehandOperationStats> = {};
26+
const perSessionUsage: Record<string, StagehandSessionUsage> = {};
27+
28+
function getOrCreateOperationStats(
29+
container: Record<StagehandUsageOperation, StagehandOperationStats>,
30+
operation: StagehandUsageOperation,
31+
): StagehandOperationStats {
32+
if (!container[operation]) {
33+
container[operation] = {
34+
callCount: 0,
35+
toolCallCounts: {},
36+
};
37+
}
38+
return container[operation];
39+
}
40+
41+
async function logStagehandMetrics(
42+
stagehand: Stagehand | undefined,
43+
key: StagehandUsageKey,
44+
): Promise<void> {
45+
if (!stagehand) return;
46+
47+
48+
const rawMetrics: any = (stagehand as any).metrics;
49+
const metrics =
50+
rawMetrics && typeof rawMetrics.then === "function"
51+
? await rawMetrics
52+
: rawMetrics;
53+
54+
if (!metrics) return;
55+
56+
// Keep this as a structured JSON line so it’s easy to grep/pipe elsewhere.
57+
58+
console.log(
59+
JSON.stringify(
60+
{
61+
source: "stagehand-mcp",
62+
event: "stagehand_metrics",
63+
...key,
64+
metrics,
65+
},
66+
null,
67+
2,
68+
),
69+
);
70+
}
71+
72+
export async function recordStagehandCall(
73+
args: StagehandUsageKey & { stagehand?: Stagehand },
74+
): Promise<void> {
75+
const { sessionId, toolName, operation, stagehand } = args;
76+
77+
// Update global aggregate
78+
const globalStats = getOrCreateOperationStats(globalUsage, operation);
79+
globalStats.callCount += 1;
80+
globalStats.toolCallCounts[toolName] =
81+
(globalStats.toolCallCounts[toolName] ?? 0) + 1;
82+
83+
// Update per-session usage
84+
if (!perSessionUsage[sessionId]) {
85+
perSessionUsage[sessionId] = { operations: {} };
86+
}
87+
88+
const sessionStats = getOrCreateOperationStats(
89+
perSessionUsage[sessionId].operations,
90+
operation,
91+
);
92+
sessionStats.callCount += 1;
93+
sessionStats.toolCallCounts[toolName] =
94+
(sessionStats.toolCallCounts[toolName] ?? 0) + 1;
95+
96+
await logStagehandMetrics(stagehand, { sessionId, toolName, operation });
97+
}
98+
99+
export function getUsageSnapshot(): StagehandUsageSnapshot {
100+
return {
101+
global: globalUsage,
102+
perSession: perSessionUsage,
103+
};
104+
}
105+
106+
export function resetUsage(): void {
107+
for (const key of Object.keys(globalUsage)) {
108+
109+
delete globalUsage[key];
110+
}
111+
for (const key of Object.keys(perSessionUsage)) {
112+
113+
delete perSessionUsage[key];
114+
}
115+
}

src/tools/act.ts

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ import { z } from "zod";
22
import type { Tool, ToolSchema, ToolResult } from "./tool.js";
33
import type { Context } from "../context.js";
44
import type { ToolActionResult } from "../types/types.js";
5+
import { recordStagehandCall } from "../mcp/usage.js";
56

67
/**
78
* Stagehand Act
@@ -45,6 +46,13 @@ async function handleAct(
4546
variables: params.variables,
4647
});
4748

49+
await recordStagehandCall({
50+
sessionId: context.currentSessionId,
51+
toolName: actSchema.name,
52+
operation: "act",
53+
stagehand,
54+
});
55+
4856
return {
4957
content: [
5058
{

src/tools/agent.ts

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ import { z } from "zod";
22
import type { Tool, ToolSchema, ToolResult } from "./tool.js";
33
import type { Context } from "../context.js";
44
import type { ToolActionResult } from "../types/types.js";
5+
import { recordStagehandCall } from "../mcp/usage.js";
56

67
/**
78
* Stagehand Agent
@@ -54,6 +55,13 @@ async function handleAgent(
5455
maxSteps: 20,
5556
});
5657

58+
await recordStagehandCall({
59+
sessionId: context.currentSessionId,
60+
toolName: agentSchema.name,
61+
operation: "agent.execute",
62+
stagehand,
63+
});
64+
5765
return {
5866
content: [
5967
{

src/tools/extract.ts

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ import { z } from "zod";
22
import type { Tool, ToolSchema, ToolResult } from "./tool.js";
33
import type { Context } from "../context.js";
44
import type { ToolActionResult } from "../types/types.js";
5+
import { recordStagehandCall } from "../mcp/usage.js";
56

67
/**
78
* Stagehand Extract
@@ -39,6 +40,13 @@ async function handleExtract(
3940

4041
const extraction = await stagehand.extract(params.instruction);
4142

43+
await recordStagehandCall({
44+
sessionId: context.currentSessionId,
45+
toolName: extractSchema.name,
46+
operation: "extract",
47+
stagehand,
48+
});
49+
4250
return {
4351
content: [
4452
{

src/tools/index.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ import screenshotTool from "./screenshot.js";
66
import sessionTools from "./session.js";
77
import getUrlTool from "./url.js";
88
import agentTool from "./agent.js";
9+
import usageTool from "./usage.js";
910

1011
// Export individual tools
1112
export { default as navigateTool } from "./navigate.js";
@@ -16,6 +17,7 @@ export { default as screenshotTool } from "./screenshot.js";
1617
export { default as sessionTools } from "./session.js";
1718
export { default as getUrlTool } from "./url.js";
1819
export { default as agentTool } from "./agent.js";
20+
export { default as usageTool } from "./usage.js";
1921

2022
// Export all tools as array
2123
export const TOOLS = [
@@ -27,6 +29,7 @@ export const TOOLS = [
2729
screenshotTool,
2830
getUrlTool,
2931
agentTool,
32+
usageTool,
3033
];
3134

3235
export const sessionManagementTools = sessionTools;

src/tools/navigate.ts

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ import { z } from "zod";
22
import type { Tool, ToolSchema, ToolResult } from "./tool.js";
33
import type { Context } from "../context.js";
44
import type { ToolActionResult } from "../types/types.js";
5+
import { recordStagehandCall } from "../mcp/usage.js";
56

67
const NavigateInputSchema = z.object({
78
url: z.string().describe("The URL to navigate to"),
@@ -37,6 +38,13 @@ async function handleNavigate(
3738
throw new Error("No Browserbase session ID available");
3839
}
3940

41+
await recordStagehandCall({
42+
sessionId: context.currentSessionId,
43+
toolName: navigateSchema.name,
44+
operation: "navigate.goto",
45+
stagehand,
46+
});
47+
4048
return {
4149
content: [
4250
{

src/tools/observe.ts

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ import { z } from "zod";
22
import type { Tool, ToolSchema, ToolResult } from "./tool.js";
33
import type { Context } from "../context.js";
44
import type { ToolActionResult } from "../types/types.js";
5+
import { recordStagehandCall } from "../mcp/usage.js";
56

67
/**
78
* Stagehand Observe
@@ -42,6 +43,13 @@ async function handleObserve(
4243

4344
const observations = await stagehand.observe(params.instruction);
4445

46+
await recordStagehandCall({
47+
sessionId: context.currentSessionId,
48+
toolName: observeSchema.name,
49+
operation: "observe",
50+
stagehand,
51+
});
52+
4553
return {
4654
content: [
4755
{

src/tools/url.ts

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ import { z } from "zod";
22
import type { Tool, ToolSchema, ToolResult } from "./tool.js";
33
import type { Context } from "../context.js";
44
import type { ToolActionResult } from "../types/types.js";
5+
import { recordStagehandCall } from "../mcp/usage.js";
56

67
/**
78
* Stagehand Get URL
@@ -37,6 +38,13 @@ async function handleGetUrl(
3738

3839
const currentUrl = page.url();
3940

41+
await recordStagehandCall({
42+
sessionId: context.currentSessionId,
43+
toolName: getUrlSchema.name,
44+
operation: "get_url",
45+
stagehand,
46+
});
47+
4048
return {
4149
content: [
4250
{

0 commit comments

Comments
 (0)