feat(ci): add weekly performance monitoring workflow#1272
Conversation
Add benchmark script and GitHub Actions workflow to track key performance metrics (container startup, proxy latency, memory, network creation) with statistical analysis and automatic regression detection. Closes #337 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
✅ Coverage Check PassedOverall Coverage
📁 Per-file Coverage Changes (1 files)
Coverage comparison generated by |
There was a problem hiding this comment.
Pull request overview
This PR adds a weekly performance monitoring workflow for the Agentic Workflow Firewall (AWF). It introduces a TypeScript benchmark script that measures container startup times, HTTPS proxy latency, memory footprint, and Docker network creation time, along with a GitHub Actions workflow that runs these benchmarks on a schedule and auto-creates issues on regressions.
Changes:
- New
scripts/ci/benchmark-performance.tsbenchmark script with statistical analysis (mean, median, p95, p99) and threshold-based regression detection - New
.github/workflows/performance-monitor.ymlweekly CI workflow with manual dispatch, artifact storage, summary tables, and auto-issue creation - Added
npm run benchmarkscript entry inpackage.json
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
scripts/ci/benchmark-performance.ts |
TypeScript benchmark script measuring 5 performance metrics with configurable iterations and threshold checks |
.github/workflows/performance-monitor.yml |
GitHub Actions workflow running benchmarks weekly, uploading artifacts, generating summaries, and creating issues on regressions |
package.json |
Adds benchmark npm script |
Comments suppressed due to low confidence (1)
scripts/ci/benchmark-performance.ts:181
- The
--keep-containersflag causes AWF to preserve containers, but afterecho measuring_memoryexits,awfmay still tear down containers before thedocker statscommands run (since--keep-containerskeeps containers but the main command has already completed). The timing between theexec()returning and thedocker statscalls creates a race condition. Consider running a longer-lived command (e.g.,sleep 5) instead ofecho measuring_memoryand querying stats while the command is still running, or restructuring to usedocker composedirectly.
// Run a sleep command so containers stay up, then check memory
const output = exec(
`${AWF_CMD} --allow-domains ${ALLOWED_DOMAIN} --log-level error --keep-containers -- ` +
`echo measuring_memory`
);
// Get memory stats for both containers
const squidMem = exec(
"sudo docker stats awf-squid --no-stream --format '{{.MemUsage}}' 2>/dev/null || echo '0MiB'"
);
const agentMem = exec(
"sudo docker stats awf-agent --no-stream --format '{{.MemUsage}}' 2>/dev/null || echo '0MiB'"
);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - name: Run benchmarks | ||
| id: benchmark | ||
| run: | | ||
| npx tsx scripts/ci/benchmark-performance.ts > benchmark-results.json 2>&1 || true |
| inputs: | ||
| iterations: | ||
| description: "Number of iterations per metric" | ||
| required: false | ||
| default: "5" |
| if (values.length === 0) { | ||
| values.push(0); | ||
| } |
| function benchmarkHttpsLatency(): BenchmarkResult { | ||
| console.error(" Benchmarking HTTPS latency through Squid..."); | ||
| const values: number[] = []; | ||
|
|
||
| for (let i = 0; i < ITERATIONS; i++) { | ||
| cleanup(); | ||
| try { | ||
| // Use curl's time_total to measure end-to-end HTTPS request latency | ||
| const output = exec( | ||
| `${AWF_CMD} --allow-domains ${ALLOWED_DOMAIN} --log-level error -- ` + | ||
| `curl -fsS -o /dev/null -w '%{time_total}' https://${ALLOWED_DOMAIN}/zen` | ||
| ); | ||
| const seconds = parseFloat(output); | ||
| if (!isNaN(seconds)) { | ||
| values.push(Math.round(seconds * 1000)); | ||
| } | ||
| } catch { | ||
| console.error(` Iteration ${i + 1}/${ITERATIONS}: failed (skipped)`); | ||
| continue; | ||
| } | ||
| console.error(` Iteration ${i + 1}/${ITERATIONS}: ${values[values.length - 1]}ms`); | ||
| } |
| // may not exist | ||
| } | ||
| const ms = timeMs(() => { | ||
| exec(`sudo docker network create --subnet=172.${31 + i}.0.0/24 ${netName}`, { stdio: "ignore" }); |
| npx tsx scripts/ci/benchmark-performance.ts > benchmark-results.json 2>&1 || true | ||
| cat benchmark-results.json |
| function stats(values: number[]): Pick<BenchmarkResult, "mean" | "median" | "p95" | "p99"> { | ||
| const sorted = [...values].sort((a, b) => a - b); | ||
| const n = sorted.length; | ||
| return { | ||
| mean: Math.round(sorted.reduce((a, b) => a + b, 0) / n), | ||
| median: sorted[Math.floor(n / 2)], | ||
| p95: sorted[Math.min(Math.floor(n * 0.95), n - 1)], | ||
| p99: sorted[Math.min(Math.floor(n * 0.99), n - 1)], | ||
| }; | ||
| } |
Smoke Test Results ✅ PASS
PR author:
|
Smoke Test Results
Overall: PASS
|
|
feat(proxy): add GitHub Enterprise Cloud/Server support with automatic endpoint detection
|
Chroot Version Comparison Results
Result: Not all runtimes match — Go matches, but Python and Node.js differ between host and chroot.
|
🏗️ Build Test Suite Results
Overall: 8/8 ecosystems passed — ✅ PASS
|
Summary
scripts/ci/benchmark-performance.ts— TypeScript benchmark script that measures container startup (cold/warm), HTTPS proxy latency, memory footprint, and Docker network creation time with statistical analysis (mean, median, p95, p99).github/workflows/performance-monitor.yml— weekly workflow (Monday 06:00 UTC) with manual dispatch that runs benchmarks, stores results as artifacts, generates step summary tables, and auto-creates labeled issues on regressionnpm run benchmarkscript to package.jsonFixes #337
Test plan
npx tsx scripts/ci/benchmark-performance.tsruns locally and outputs valid JSONworkflow_dispatch🤖 Generated with Claude Code