An 8-stage AI vulnerability-discovery agent.
Many narrow agents Β· deliberate disagreement Β· reachability as the gate.
audit finds real vulnerabilities in a codebase by running many narrow agents
in parallel instead of one big "find bugs here" prompt. Each hunter looks for
exactly one attack class; a different model then tries to disprove every
finding; and the stage that decides what ships asks the only question that
matters β can an attacker actually reach this sink from outside?
It runs on Bun + TypeScript, is billed to your Claude Pro / Max subscription through the official Claude Agent SDK (no API key needed), and validates every agent output against a JSON Schema.
- π― Narrow agents, not one mega-prompt β one attack class per task, with the trust boundary spelled out. This is what actually surfaces bugs.
- π₯ Deliberate disagreement β Validate runs on a different model than Hunt and is paid in rejections. It filters the noise that single-pass tools ship.
- π Reachability is the gate β a "buggy" sink that no attacker input can reach is dropped. Only confirmed and reachable findings make the report.
- π It learns β a proven-reachable bug seeds new hunts for the same pattern elsewhere in the repo.
- π§Ύ Schema-validated, resumable, budgeted β every output is shape-checked, every run is checkpointed in SQLite, and a cost ceiling aborts cleanly.
Recon β Hunt β Validate β Gapfill βΊ β Dedupe β Trace β Feedback βΊ β Report
| # | Stage | Model | Does |
|---|---|---|---|
| 1 | Recon | Opus | Maps the repo + git history, emits narrowly-scoped Hunt tasks |
| 2 | Hunt | Sonnet | One attack class per agent; compiles/runs PoCs |
| 3 | Validate | Opus | Adversarial re-read β tries to disprove (different model) |
| 4 | Gapfill | Sonnet | Re-queues under-covered subsystem Γ attack class cells |
| 5 | Dedupe | Sonnet | Clusters findings by root cause |
| 6 | Trace | Opus | Proves attacker input reaches the sink (the gate) |
| 7 | Feedback | Sonnet | Turns reachable traces into new hunts for siblings |
| 8 | Report | Sonnet | Schema-validated structured report |
Full details in docs/stages.md.
# 1. Install globally β requires Bun β₯ 1.3 (https://bun.sh)
bun add -g @usex/audit
# 2. Auth β already logged in via `claude login`? You're done.
# Or, for CI / headless:
claude setup-token && echo "CLAUDE_CODE_OAUTH_TOKEN=<paste>" > .env
# 3. Verify
audit auth-check
# 4. Run β `cd` into the repo you want to audit; results/state.db land there
cd /path/to/target
audit run --run-id my-run
audit status --run-id my-run
audit report --run-id my-run --format md > report.mdThe global install exposes an
auditbinary on yourPATH. It runs under the Bun runtime (the bundled CLI usesbun:sqlite), so Bun must be installed even when invoked via npm.Where state lives: scan artifacts (
results/,work/,state.db) are written to the current working directory β the repo you're auditing β not the install location. SetAUDIT_DATA_DIR=/some/pathto redirect them.
bun install
bun run src/cli.ts auth-check # run directly from source
bun link # or expose the `audit` binary locallyA real codebase yields 15β50 Hunt tasks; the loops expand coverage. Rein it in:
bun run src/cli.ts run --repo /path/to/target \
--max-concurrency 1 \ # one agent at a time
--max-recon-tasks 15 \ # smaller initial fan-out
--max-cost-usd 30 # hard ceiling, resumable with --resumePointed at a small Flask app with two planted bugs, audit reported seven
confirmed + reachable findings β including a chain the planted bugs didn't
spell out:
total: 7 β critical: 3, high: 2, medium: 2
β’ critical command_injection Unauthenticated OS command injection in GET /ping via `host` (shell=True)
β’ critical logic_chain Zero-credential SQLiβRCE: /lookup exfiltration pivots to /ping execution
β’ critical broken_access_control Missing auth on GET /ping exposes command execution to anonymous callers
β’ high sql_injection Unauthenticated SQL injection in GET /lookup via `name`
β’ high ssrf Unauthenticated SSRF via /ping `host` enables internal host enumeration
β’ medium information_disclosure Werkzeug debug console discloses source + secrets when FLASK_DEBUG=1
β’ medium resource_exhaustion Unbounded subprocess in /ping enables WSGI worker exhaustion
The adversarial Validate stage also rejected several lower-confidence findings before they reached the report β that's the point.
Scan only what the branch changed, suppress findings you've already accepted, and fail the build only on new issues at or above a severity:
# Once, on the default branch β accept the current findings as the baseline:
audit run --run-id main && audit baseline --run-id main --out .audit-baseline.json
# On every PR β diff-scoped scan, new-only gate, SARIF for the Security tab:
audit run --base main --baseline .audit-baseline.json --fail-on high
audit report --run-id <id> --format sarif --out audit.sarif--base/--since feed Recon only the changed files plus their blast radius
(callers/callees/importers), so a PR scan costs cents. Findings are matched by a
line-shift-robust fingerprint; the exit code is 4 when the gate trips. The
SARIF output carries the reachability trace as codeFlows β reviewers see the
entry-pointβsink path, not just the sink. See the CLI reference.
- β
Subscription billing by default β
ANTHROPIC_API_KEYis scrubbed so it can't silently route to metered billing (auth docs). - β Bring your own model / gateway β OpenRouter and any Anthropic-compatible proxy, or opt into a metered API key.
- β Live-target mode β reproduce findings against a running deployment with real HTTP (docs).
- β Scope notes β exclude intentional-by-design surfaces (docs).
- β
Diff / PR mode β
--base/--sincescope the scan to changed files + blast radius. - β Baseline & delta β fingerprint findings, suppress known ones, surface NEW/FIXED.
- β
SARIF + exit-code gating β
--format sarifand--fail-on highfor CI / the GitHub Security tab. - β
Auto-fix (opt-in) β
audit fixwrites a minimal patch + regression test per reachable finding in an isolated worktree;--open-propens a draft PR (never auto-merges). - β
Code-grounded fix guidance β
audit advise(and a "Generate fix" button in the viewer) reads the real sink and explains the fix for your code; the report surfaces it inline. - β
Triage viewer β
audit report --serveis a local web UI to confirm / dismiss findings and export suppressions to a baseline. - β
Cost observability β
audit statsbreaks spend down by stage/model and reports cost-per-finding. - β
Bug-bounty / VDP triage β
audit triage --reportreproduces an inbound submission, then runs it through the adversarial reviewer + reachability gate and dedupes it, emitting an accept/reject/duplicate verdict. - β Git-history mining β seeds hunts against unpatched siblings of past fixes.
- β Resumable runs, per-stage concurrency, and a hard cost ceiling.
- β
Background runs β
audit run -ddetaches the pipeline;audit sessionslists what's active and whether it's still alive.
| Doc | |
|---|---|
| Architecture | Pipeline graph, data flow, loops |
| Stages | All 8 stages in detail |
| CLI reference | Every command and flag |
| Configuration | stages.yaml, env vars, loop counts |
| Authentication | Subscription, gateways, API key |
| Live-target Β· Scope notes | Opt-in modes |
| State & artifacts | SQLite, JSONL, resume |
| Programmatic API | Use it as a library |
| Troubleshooting | Quota, schema failures, cost |
bun test # unit tests (offline)
bun run test:types # tsc --noEmit
bun run build # bundle to dist/See CONTRIBUTING.md.
Hunt agents have Bash and run inside per-task scratch dirs β they are not
OS-sandboxed. Run the audit inside a disposable VM or container when you don't
trust the target source; a malicious build script could otherwise execute on
your host during PoC compilation. The agent reads everything under the target
(including any .env / secrets), and outputs in results/<run-id>/ are
gitignored but not scrubbed of those reads. Only point --target-url at
systems you're authorized to test.
MIT. Reuse freely. No warranty.
The pipeline design is from Cloudflare's Project Glasswing. Built on the official Claude Agent SDK.
Made with β€οΈ by Ali Torki