LLM-driven browser automation. Reads page state via accessibility tree, decides actions via LLM, executes in a loop until the goal is done.
90% pass rate on WebBench-50. Default model: gpt-5.4.
curl -fsSL https://raw.githubusercontent.com/tangle-network/browser-agent-driver/main/scripts/install.sh | shInstalls the bad command to ~/.local/bin, downloads Playwright Chromium, and adds PATH instructions. Requires Node.js 20+.
Or via npm:
npm i -g @tangle-network/browser-agent-driver
npx playwright install chromiumpnpm add @tangle-network/browser-agent-driver
pnpm add -D playwrightimport { chromium } from 'playwright'
import { PlaywrightDriver, BrowserAgent } from '@tangle-network/browser-agent-driver'
const browser = await chromium.launch()
const page = await browser.newPage()
const driver = new PlaywrightDriver(page)
const runner = new BrowserAgent({
driver,
config: { model: 'gpt-5.4' },
})
const result = await runner.run({
goal: 'Sign in and navigate to settings',
startUrl: 'https://app.example.com',
maxTurns: 30,
})
console.log(result.success, `${result.turns.length} turns`)
await browser.close()# single task
bad run --goal "Sign up for account" --url http://localhost:3000
# test suite from case file
bad run --cases ./cases.json
# authenticated session
bad run --goal "Open settings" --url https://app.example.com \
--storage-state ./.auth/session.json
# speed-optimized mode
bad run --cases ./cases.json --mode fast-explore
# evidence-rich mode for signoff
bad run --cases ./cases.json --mode full-evidenceCreate browser-agent-driver.config.ts in your project root:
import { defineConfig } from '@tangle-network/browser-agent-driver'
export default defineConfig({
model: 'gpt-5.4',
headless: true,
concurrency: 4,
maxTurns: 30,
timeoutMs: 300_000,
outputDir: './test-results',
reporters: ['junit', 'html'],
})Auto-detected by CLI and programmatic API. CLI flags override config values. Supports .ts, .js, .mjs.
import { TestRunner } from '@tangle-network/browser-agent-driver'
const suite = await runner.runSuite([
{
id: 'login',
name: 'User login flow',
startUrl: 'https://app.example.com/login',
goal: 'Log in with test credentials',
successCriteria: [
{ type: 'url-contains', value: '/dashboard' },
{ type: 'element-visible', selector: '[data-testid="user-menu"]' },
],
},
])The LLM can perform: click, type, press, hover, select, scroll, navigate, wait, evaluate, verifyPreview, complete, abort.
Each turn: observe page (a11y tree + optional screenshot) → LLM decides action → execute → verify effect → repeat.
Recovery is automatic: cookie consent, modal blockers, stuck loops (A-B-A-B oscillation), and selector failures are handled before the agent continues.
bad design-audit is a vision-powered design quality analyzer with a closed-loop improvement mode. It auto-classifies the page, runs ground-truth measurements (axe-core + WCAG contrast math), then evaluates visual quality with a composable rubric — and ranks the top fixes by ROI.
# Audit any URL — auto-classifies, no profile needed
bad design-audit --url https://your-app.com
# Multi-page crawl with cross-page systemic detection
bad design-audit --url https://your-app.com --pages 10
# Closed-loop fix: dispatch findings to a coding agent that edits source files
bad design-audit --url http://localhost:3000 \
--evolve claude-code \
--project-dir ~/my-app
# Other evolve modes: codex, opencode, css (browser injection), or any custom CLI
bad design-audit --url http://localhost:3000 --evolve "aider --message"
# Pure DOM token extraction (no LLM)
bad design-audit --url https://your-app.com --extract-tokensReports open with Top Fixes (by ROI) — the 5 highest-leverage fixes ranked by (impact × blast / effort). Findings appearing on multiple pages collapse into systemic findings. Verified end-to-end: a deliberately-bad fixture went 3.0 → 5.0 (+2.0) over 2 evolve rounds with claude-code rewriting actual source files.
See Design Audit Guide for the full pipeline, custom rubric fragments, and starter-foundry integration.
bad view opens any run in a polished web UI:
bad view audit-results/stripe.com-1775502457141- Sidebar lists every page (or turn) in the run
- Top Fixes section opens by default for design audits, ranked by ROI
- Per-page screenshots, design system breakdown, findings table, classification
- Per-turn action JSON, reasoning, expected effect, result for agent runs
- Self-contained — no build pipeline, single static HTML, no external dependencies (the viewer is served by a local loopback HTTP server on port 7777)
Pair with --show-cursor to record runs with an animated cursor + element highlights overlaid on every screenshot.
bad's agent loop is decoupled from the browser layer via the Driver interface. The default is local Playwright, but you can run the same agent against managed cloud infra without any code changes:
import { BrowserAgent, SteelDriver } from '@tangle-network/browser-agent-driver'
// Local Playwright (default) — see Quick Start above
// Steel cloud browser with anti-bot, residential proxies, CAPTCHA solving
const driver = await SteelDriver.create({
apiKey: process.env.STEEL_API_KEY,
sessionOptions: { useProxy: true, solveCaptcha: true },
})
const agent = new BrowserAgent({ driver, config: { model: 'sonnet' } })
await agent.run({ goal: '...', startUrl: '...' })
await driver.close()The same agent — design audit, evolve loops, wallet automation, knowledge memory — runs against any driver. Steel handles infra you don't want to build; bad handles the agent layer Steel doesn't.
Drop bad design-audit into any PR pipeline:
- uses: tangle-network/browser-agent-driver/.github/actions/design-audit@main
with:
url: ${{ steps.deploy.outputs.preview_url }}
pages: 5
fail-on-score-below: '6.5'
evolve: claude-code # optional auto-fix
openai-api-key: ${{ secrets.OPENAI_API_KEY }}The action posts the Top Fixes (by ROI) as a PR comment, uploads the full report as a workflow artifact, and optionally fails the build on score regressions or critical findings. See .github/actions/design-audit.
- Configuration Reference — all config options
- CLI Reference — commands, modes, profiles, auth
- Design Audit — vision-powered design quality + ROI-ranked closed-loop improvement
- Memory System — trajectory store, app knowledge, selector cache
- Benchmarks & Experiments — tiered gates, AB specs, research cycles
- Wallet & EVM Apps — MetaMask, DeFi testing, RPC interception, Anvil forks
- Providers — OpenAI, Anthropic, Codex CLI, Claude Code, sandbox backend
- Reporters & Sinks — JUnit, HTML, webhooks, custom sinks
- Custom Drivers — implement the
Driverinterface
Ships Codex skills under skills/ for test execution discipline and agent-friendly UX conventions.
npm run skills:installVersioning and releases are automated via Changesets.
Contributors: add a changeset to your PR with pnpm changeset — pick patch / minor / major and write a one-line summary. The CLI creates a markdown file under .changeset/ to commit alongside your code.
Maintainers: when PRs with changesets merge to main, the changesets workflow automatically opens (or updates) a "Release: version packages" PR that bumps package.json and writes CHANGELOG.md. Merging that PR pushes a browser-agent-driver-vX.Y.Z git tag, which fires the existing release.yml and publish-npm.yml workflows that create the GitHub release tarball and publish to npm with provenance.
You stay in control of when releases ship; the bump math, changelog, tagging, and publishing are all automated. See .changeset/README.md for the full contributor flow.
pnpm build # TypeScript → dist/
pnpm test # vitest
pnpm lint # type-check
pnpm check:boundariesDual-licensed under MIT and Apache 2.0. See LICENSE.