Skip to content

feat(ci): add weekly performance monitoring workflow#1272

Merged
Mossaka merged 1 commit intomainfrom
feat/083-performance-monitoring
Mar 13, 2026
Merged

feat(ci): add weekly performance monitoring workflow#1272
Mossaka merged 1 commit intomainfrom
feat/083-performance-monitoring

Conversation

@Mossaka
Copy link
Collaborator

@Mossaka Mossaka commented Mar 13, 2026

Summary

  • Add scripts/ci/benchmark-performance.ts — TypeScript benchmark script that measures container startup (cold/warm), HTTPS proxy latency, memory footprint, and Docker network creation time with statistical analysis (mean, median, p95, p99)
  • Add .github/workflows/performance-monitor.yml — weekly workflow (Monday 06:00 UTC) with manual dispatch that runs benchmarks, stores results as artifacts, generates step summary tables, and auto-creates labeled issues on regression
  • Add npm run benchmark script to package.json

Fixes #337

Test plan

  • Verify npx tsx scripts/ci/benchmark-performance.ts runs locally and outputs valid JSON
  • Verify workflow triggers on workflow_dispatch
  • Verify step summary shows results table
  • Verify regression detection creates issue when p95 exceeds critical threshold
  • Verify benchmark completes within 30-minute timeout

🤖 Generated with Claude Code

Add benchmark script and GitHub Actions workflow to track key performance
metrics (container startup, proxy latency, memory, network creation) with
statistical analysis and automatic regression detection.

Closes #337

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 13, 2026 00:04
@github-actions
Copy link
Contributor

✅ Coverage Check Passed

Overall Coverage

Metric Base PR Delta
Lines 84.23% 84.36% 📈 +0.13%
Statements 84.21% 84.34% 📈 +0.13%
Functions 84.37% 84.37% ➡️ +0.00%
Branches 77.09% 77.17% 📈 +0.08%
📁 Per-file Coverage Changes (1 files)
File Lines (Before → After) Statements (Before → After)
src/docker-manager.ts 86.8% → 87.3% (+0.52%) 86.1% → 86.6% (+0.50%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a weekly performance monitoring workflow for the Agentic Workflow Firewall (AWF). It introduces a TypeScript benchmark script that measures container startup times, HTTPS proxy latency, memory footprint, and Docker network creation time, along with a GitHub Actions workflow that runs these benchmarks on a schedule and auto-creates issues on regressions.

Changes:

  • New scripts/ci/benchmark-performance.ts benchmark script with statistical analysis (mean, median, p95, p99) and threshold-based regression detection
  • New .github/workflows/performance-monitor.yml weekly CI workflow with manual dispatch, artifact storage, summary tables, and auto-issue creation
  • Added npm run benchmark script entry in package.json

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File Description
scripts/ci/benchmark-performance.ts TypeScript benchmark script measuring 5 performance metrics with configurable iterations and threshold checks
.github/workflows/performance-monitor.yml GitHub Actions workflow running benchmarks weekly, uploading artifacts, generating summaries, and creating issues on regressions
package.json Adds benchmark npm script
Comments suppressed due to low confidence (1)

scripts/ci/benchmark-performance.ts:181

  • The --keep-containers flag causes AWF to preserve containers, but after echo measuring_memory exits, awf may still tear down containers before the docker stats commands run (since --keep-containers keeps containers but the main command has already completed). The timing between the exec() returning and the docker stats calls creates a race condition. Consider running a longer-lived command (e.g., sleep 5) instead of echo measuring_memory and querying stats while the command is still running, or restructuring to use docker compose directly.
      // Run a sleep command so containers stay up, then check memory
      const output = exec(
        `${AWF_CMD} --allow-domains ${ALLOWED_DOMAIN} --log-level error --keep-containers -- ` +
          `echo measuring_memory`
      );
      // Get memory stats for both containers
      const squidMem = exec(
        "sudo docker stats awf-squid --no-stream --format '{{.MemUsage}}' 2>/dev/null || echo '0MiB'"
      );
      const agentMem = exec(
        "sudo docker stats awf-agent --no-stream --format '{{.MemUsage}}' 2>/dev/null || echo '0MiB'"
      );

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- name: Run benchmarks
id: benchmark
run: |
npx tsx scripts/ci/benchmark-performance.ts > benchmark-results.json 2>&1 || true
Comment on lines +8 to +12
inputs:
iterations:
description: "Number of iterations per metric"
required: false
default: "5"
Comment on lines +155 to +157
if (values.length === 0) {
values.push(0);
}
Comment on lines +132 to +153
function benchmarkHttpsLatency(): BenchmarkResult {
console.error(" Benchmarking HTTPS latency through Squid...");
const values: number[] = [];

for (let i = 0; i < ITERATIONS; i++) {
cleanup();
try {
// Use curl's time_total to measure end-to-end HTTPS request latency
const output = exec(
`${AWF_CMD} --allow-domains ${ALLOWED_DOMAIN} --log-level error -- ` +
`curl -fsS -o /dev/null -w '%{time_total}' https://${ALLOWED_DOMAIN}/zen`
);
const seconds = parseFloat(output);
if (!isNaN(seconds)) {
values.push(Math.round(seconds * 1000));
}
} catch {
console.error(` Iteration ${i + 1}/${ITERATIONS}: failed (skipped)`);
continue;
}
console.error(` Iteration ${i + 1}/${ITERATIONS}: ${values[values.length - 1]}ms`);
}
// may not exist
}
const ms = timeMs(() => {
exec(`sudo docker network create --subnet=172.${31 + i}.0.0/24 ${netName}`, { stdio: "ignore" });
Comment on lines +50 to +51
npx tsx scripts/ci/benchmark-performance.ts > benchmark-results.json 2>&1 || true
cat benchmark-results.json
Comment on lines +64 to +73
function stats(values: number[]): Pick<BenchmarkResult, "mean" | "median" | "p95" | "p99"> {
const sorted = [...values].sort((a, b) => a - b);
const n = sorted.length;
return {
mean: Math.round(sorted.reduce((a, b) => a + b, 0) / n),
median: sorted[Math.floor(n / 2)],
p95: sorted[Math.min(Math.floor(n * 0.95), n - 1)],
p99: sorted[Math.min(Math.floor(n * 0.99), n - 1)],
};
}
@github-actions
Copy link
Contributor

Smoke Test Results ✅ PASS

Test Result
GitHub MCP - Last 2 merged PRs #1267 "fix: drop -f from curl to avoid GitHub API rate-limit flakiness", #1265 "fix: add missing formatItem and program imports in cli.test.ts"
Playwright - github.com title check ✅ "GitHub · Change is constant. GitHub keeps you ahead. · GitHub"
File write /tmp/gh-aw/agent/smoke-test-copilot-23029774287.txt created
Bash verify ✅ File content confirmed

PR author: @Mossaka

📰 BREAKING: Report filed by Smoke Copilot for issue #1272

@github-actions
Copy link
Contributor

Smoke Test Results

Overall: PASS

💥 [THE END] — Illustrated by Smoke Claude for issue #1272

@github-actions
Copy link
Contributor

feat(proxy): add GitHub Enterprise Cloud/Server support with automatic endpoint detection
fix: drop -f from curl to avoid GitHub API rate-limit flakiness
Tests: GitHub MCP✅; safeinputs-gh✅; Playwright✅; Tavily❌; file write✅; bash cat✅; discussion✅; build✅
Overall: FAIL

🔮 The oracle has spoken through Smoke Codex for issue #1272

@github-actions
Copy link
Contributor

Chroot Version Comparison Results

Runtime Host Version Chroot Version Match?
Python Python 3.12.12 Python 3.12.3
Node.js v24.14.0 v20.20.0
Go go1.22.12 go1.22.12

Result: Not all runtimes match — Go matches, but Python and Node.js differ between host and chroot.

Tested by Smoke Chroot for issue #1272

@github-actions github-actions bot mentioned this pull request Mar 13, 2026
@Mossaka Mossaka enabled auto-merge (squash) March 13, 2026 00:11
@github-actions
Copy link
Contributor

🏗️ Build Test Suite Results

Ecosystem Project Build/Install Tests Status
Bun elysia 1/1 passed ✅ PASS
Bun hono 1/1 passed ✅ PASS
C++ fmt N/A ✅ PASS
C++ json N/A ✅ PASS
Deno oak N/A 1/1 passed ✅ PASS
Deno std N/A 1/1 passed ✅ PASS
.NET hello-world N/A ✅ PASS
.NET json-parse N/A ✅ PASS
Go color 1/1 passed ✅ PASS
Go env 1/1 passed ✅ PASS
Go uuid 1/1 passed ✅ PASS
Java gson 1/1 passed ✅ PASS
Java caffeine 1/1 passed ✅ PASS
Node.js clsx passed ✅ PASS
Node.js execa passed ✅ PASS
Node.js p-limit passed ✅ PASS
Rust fd 1/1 passed ✅ PASS
Rust zoxide 1/1 passed ✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Generated by Build Test Suite for issue #1272 ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[plan] Establish performance monitoring baseline and workflow

2 participants