Skip to content

Commit c2f55c3

Browse files
authored
Feat/capture iframes (#72)
<!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added JavaScript execution capability via new /page/execute endpoint * Extended page content retrieval to optionally include iframes * Added frame count information to content responses * **Documentation** * Added comprehensive API documentation for JavaScript execution endpoint with examples and use cases * Updated server capability documentation to reflect JavaScript execution support <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
1 parent 4ab6446 commit c2f55c3

File tree

4 files changed

+355
-19
lines changed

4 files changed

+355
-19
lines changed

agent-server/README.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -210,6 +210,92 @@ Get HTML or text content of a page.
210210
}
211211
```
212212

213+
#### `POST /page/execute`
214+
215+
Execute JavaScript code in the context of a specific browser tab via Chrome DevTools Protocol.
216+
217+
**Request:**
218+
```json
219+
{
220+
"clientId": "9907fd8d-92a8-4a6a-bce9-458ec8c57306",
221+
"tabId": "482D56EE57B1931A3B9D1BFDAF935429",
222+
"expression": "document.title",
223+
"returnByValue": true,
224+
"awaitPromise": false
225+
}
226+
```
227+
228+
**Parameters:**
229+
- `clientId` (required): The client ID from `/v1/responses` metadata
230+
- `tabId` (required): The tab ID from `/v1/responses` metadata
231+
- `expression` (required): JavaScript code to execute (string)
232+
- `returnByValue` (optional, default: `true`): Whether to return result by value or as object reference
233+
- `awaitPromise` (optional, default: `false`): Whether to await if the result is a Promise
234+
235+
**Response:**
236+
```json
237+
{
238+
"clientId": "9907fd8d-92a8-4a6a-bce9-458ec8c57306",
239+
"tabId": "482D56EE57B1931A3B9D1BFDAF935429",
240+
"result": {
241+
"type": "string",
242+
"value": "Example Page Title"
243+
},
244+
"exceptionDetails": null,
245+
"timestamp": 1234567890
246+
}
247+
```
248+
249+
**Response Fields:**
250+
- `clientId`: Base client ID (without tab suffix)
251+
- `tabId`: The tab ID where JavaScript was executed
252+
- `result`: CDP `Runtime.evaluate` result object containing:
253+
- `type`: Result type (string, number, object, etc.)
254+
- `value`: The actual value (if `returnByValue: true`)
255+
- `exceptionDetails`: Error details if execution failed, otherwise `null`
256+
- `timestamp`: Unix timestamp in milliseconds
257+
258+
**Example Usage:**
259+
260+
```bash
261+
# Get page title
262+
curl -X POST http://localhost:8080/page/execute \
263+
-H "Content-Type: application/json" \
264+
-d '{
265+
"clientId": "9907fd8d-92a8-4a6a-bce9-458ec8c57306",
266+
"tabId": "482D56EE57B1931A3B9D1BFDAF935429",
267+
"expression": "document.title"
268+
}'
269+
270+
# Count elements
271+
curl -X POST http://localhost:8080/page/execute \
272+
-H "Content-Type: application/json" \
273+
-d '{
274+
"clientId": "9907fd8d-92a8-4a6a-bce9-458ec8c57306",
275+
"tabId": "482D56EE57B1931A3B9D1BFDAF935429",
276+
"expression": "document.querySelectorAll(\"button\").length"
277+
}'
278+
279+
# Execute async code with await
280+
curl -X POST http://localhost:8080/page/execute \
281+
-H "Content-Type: application/json" \
282+
-d '{
283+
"clientId": "9907fd8d-92a8-4a6a-bce9-458ec8c57306",
284+
"tabId": "482D56EE57B1931A3B9D1BFDAF935429",
285+
"expression": "fetch(\"https://api.example.com/data\").then(r => r.json())",
286+
"awaitPromise": true
287+
}'
288+
```
289+
290+
**Use Cases:**
291+
- Extract specific data from the page (e.g., element counts, text content)
292+
- Verify JavaScript state/variables for evaluations
293+
- Check DOM state programmatically
294+
- Execute custom validation logic
295+
- Interact with page APIs directly
296+
297+
This endpoint complements `/page/content` by allowing precise JavaScript execution rather than just fetching full HTML/text content.
298+
213299
#### `POST /tabs/open`
214300

215301
Open a new browser tab.

agent-server/nodejs/CLAUDE.md

Lines changed: 118 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ The eval-server is a **thin HTTP API wrapper for Browser Operator**. It provides
3939
### HTTP API Server (src/api-server.js)
4040
- Exposes REST endpoints for external callers (e.g., Python evals)
4141
- Main endpoint: `POST /v1/responses` - Send task to agent
42-
- CDP endpoints: screenshot, page content, tab management
42+
- CDP endpoints: screenshot, page content, JavaScript execution, tab management
4343
- Returns metadata (clientId, tabId) for subsequent operations
4444

4545
### RPC Client (src/rpc-client.js)
@@ -57,6 +57,7 @@ The eval-server is a **thin HTTP API wrapper for Browser Operator**. It provides
5757
- Direct Chrome DevTools Protocol communication
5858
- Screenshot capture via `Page.captureScreenshot`
5959
- Page content access via `Runtime.evaluate`
60+
- JavaScript execution via `Runtime.evaluate` (with configurable options)
6061
- Tab management via `Target.createTarget` / `Target.closeTarget`
6162

6263
### Logger (src/logger.js)
@@ -208,10 +209,124 @@ Get HTML or text content of a page.
208209
{
209210
"clientId": "9907fd8d-92a8-4a6a-bce9-458ec8c57306",
210211
"tabId": "482D56EE57B1931A3B9D1BFDAF935429",
211-
"format": "html"
212+
"format": "html",
213+
"includeIframes": true
212214
}
213215
```
214216

217+
**Parameters:**
218+
- `clientId` (required): The client ID from `/v1/responses` metadata
219+
- `tabId` (required): The tab ID from `/v1/responses` metadata
220+
- `format` (optional, default: `"html"`): Content format - either `"html"` or `"text"`
221+
- `includeIframes` (optional, default: `false`): Whether to include HTML content from iframes. When `true`, recursively captures content from all iframe elements on the page.
222+
223+
**Response:**
224+
```json
225+
{
226+
"clientId": "9907fd8d-92a8-4a6a-bce9-458ec8c57306",
227+
"tabId": "482D56EE57B1931A3B9D1BFDAF935429",
228+
"content": "<html>...</html>",
229+
"format": "html",
230+
"length": 12345,
231+
"frameCount": 3,
232+
"timestamp": 1234567890
233+
}
234+
```
235+
236+
**Response fields:**
237+
- `frameCount` (number, optional): Number of frames included in the content. Only present when `includeIframes: true` is used.
238+
239+
### POST /page/execute
240+
241+
Execute JavaScript code in the context of a specific browser tab via Chrome DevTools Protocol.
242+
243+
**Request:**
244+
```json
245+
{
246+
"clientId": "9907fd8d-92a8-4a6a-bce9-458ec8c57306",
247+
"tabId": "482D56EE57B1931A3B9D1BFDAF935429",
248+
"expression": "document.title",
249+
"returnByValue": true,
250+
"awaitPromise": false
251+
}
252+
```
253+
254+
**Parameters:**
255+
- `clientId` (required): The client ID from `/v1/responses` metadata
256+
- `tabId` (required): The tab ID from `/v1/responses` metadata
257+
- `expression` (required): JavaScript code to execute (string)
258+
- `returnByValue` (optional, default: `true`): Whether to return result by value or as object reference
259+
- `awaitPromise` (optional, default: `false`): Whether to await if the result is a Promise
260+
261+
**Response:**
262+
```json
263+
{
264+
"clientId": "9907fd8d-92a8-4a6a-bce9-458ec8c57306",
265+
"tabId": "482D56EE57B1931A3B9D1BFDAF935429",
266+
"result": {
267+
"type": "string",
268+
"value": "Example Page Title"
269+
},
270+
"exceptionDetails": null,
271+
"timestamp": 1234567890
272+
}
273+
```
274+
275+
**Response Fields:**
276+
- `clientId`: Base client ID (without tab suffix)
277+
- `tabId`: The tab ID where JavaScript was executed
278+
- `result`: CDP `Runtime.evaluate` result object containing:
279+
- `type`: Result type (string, number, object, etc.)
280+
- `value`: The actual value (if `returnByValue: true`)
281+
- `exceptionDetails`: Error details if execution failed, otherwise `null`
282+
- `timestamp`: Unix timestamp in milliseconds
283+
284+
**Implementation:**
285+
- Uses CDP `Runtime.evaluate` via `browserAgentServer.evaluateJavaScript()`
286+
- Executes code in the page's main JavaScript context
287+
- First 100 characters of expression logged for debugging
288+
289+
**Example Usage:**
290+
291+
```bash
292+
# Get page title
293+
curl -X POST http://localhost:8080/page/execute \
294+
-H "Content-Type: application/json" \
295+
-d '{
296+
"clientId": "9907fd8d-92a8-4a6a-bce9-458ec8c57306",
297+
"tabId": "482D56EE57B1931A3B9D1BFDAF935429",
298+
"expression": "document.title"
299+
}'
300+
301+
# Count elements
302+
curl -X POST http://localhost:8080/page/execute \
303+
-H "Content-Type: application/json" \
304+
-d '{
305+
"clientId": "9907fd8d-92a8-4a6a-bce9-458ec8c57306",
306+
"tabId": "482D56EE57B1931A3B9D1BFDAF935429",
307+
"expression": "document.querySelectorAll(\"button\").length"
308+
}'
309+
310+
# Execute async code with await
311+
curl -X POST http://localhost:8080/page/execute \
312+
-H "Content-Type: application/json" \
313+
-d '{
314+
"clientId": "9907fd8d-92a8-4a6a-bce9-458ec8c57306",
315+
"tabId": "482D56EE57B1931A3B9D1BFDAF935429",
316+
"expression": "fetch(\"https://api.example.com/data\").then(r => r.json())",
317+
"awaitPromise": true
318+
}'
319+
```
320+
321+
**Use Cases:**
322+
- Extract specific data from the page (e.g., element counts, text content)
323+
- Verify JavaScript state/variables for evaluations
324+
- Check DOM state programmatically
325+
- Execute custom validation logic
326+
- Interact with page APIs directly
327+
328+
This endpoint complements `/page/content` by allowing precise JavaScript execution rather than just fetching full HTML/text content.
329+
215330
### POST /tabs/open, POST /tabs/close
216331

217332
Tab management via CDP.
@@ -412,5 +527,6 @@ Removed dependencies:
412527
- ✅ HTTP REST API endpoints
413528
- ✅ CDP screenshot capture
414529
- ✅ CDP page content retrieval
530+
- ✅ CDP JavaScript execution
415531
- ✅ CDP tab management
416532
- ✅ Return metadata (clientId, tabId) for screenshot capture

agent-server/nodejs/src/api-server.js

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -284,7 +284,7 @@ class APIServer {
284284
}
285285

286286
async getPageContent(payload) {
287-
const { clientId, tabId, format = 'html' } = payload;
287+
const { clientId, tabId, format = 'html', includeIframes = false } = payload;
288288

289289
if (!clientId) {
290290
throw new Error('Client ID is required');
@@ -300,21 +300,28 @@ class APIServer {
300300

301301
const baseClientId = clientId.split(':')[0];
302302

303-
logger.info('Getting page content', { baseClientId, tabId, format });
303+
logger.info('Getting page content', { baseClientId, tabId, format, includeIframes });
304304

305305
// Call appropriate method based on format
306306
const result = format === 'html'
307-
? await this.browserAgentServer.getPageHTML(tabId)
308-
: await this.browserAgentServer.getPageText(tabId);
307+
? await this.browserAgentServer.getPageHTML(tabId, { includeIframes })
308+
: await this.browserAgentServer.getPageText(tabId, { includeIframes });
309309

310-
return {
310+
const response = {
311311
clientId: baseClientId,
312312
tabId: result.tabId,
313313
content: result.content,
314314
format: result.format,
315315
length: result.length,
316316
timestamp: Date.now()
317317
};
318+
319+
// Include frame count if iframes were captured
320+
if (result.frameCount !== undefined) {
321+
response.frameCount = result.frameCount;
322+
}
323+
324+
return response;
318325
}
319326

320327
async getScreenshot(payload) {

0 commit comments

Comments
 (0)