Skip to content

feat: add DLP scanning to block credential exfiltration in URLs#1288

Merged
Mossaka merged 1 commit intomainfrom
feat/093-content-inspection-dlp
Mar 13, 2026
Merged

feat: add DLP scanning to block credential exfiltration in URLs#1288
Mossaka merged 1 commit intomainfrom
feat/093-content-inspection-dlp

Conversation

@Mossaka
Copy link
Collaborator

@Mossaka Mossaka commented Mar 13, 2026

Summary

  • Add opt-in --enable-dlp CLI flag that enables Data Loss Prevention scanning on outbound HTTP/HTTPS traffic
  • Implement URL regex pattern matching in Squid proxy ACLs to detect and block requests containing credential patterns (GitHub tokens ghp_/gho_/ghs_/ghu_/github_pat_, OpenAI sk-/sk-proj-, Anthropic sk-ant-, AWS AKIA, Google AIza, Slack xoxb-/xoxp-, and generic bearer/authorization patterns)
  • New src/dlp.ts module with pattern definitions, credential scanning function, and Squid ACL generation
  • Comprehensive unit tests for pattern matching and Squid config integration

Test plan

  • Unit tests for all credential pattern detection (positive and negative cases)
  • Unit tests for Squid ACL generation
  • Integration tests for DLP rules in generated squid.conf (placement, interaction with blocked domains, SSL Bump)
  • Build passes
  • Lint passes (0 errors)
  • Manual test: sudo awf --enable-dlp --allow-domains example.com -- curl "https://example.com/?token=ghp_testtoken123456789012345678901234"

Closes #308

🤖 Generated with Claude Code

Add opt-in --enable-dlp flag that configures Squid proxy URL regex ACLs
to detect and block outbound requests containing credential patterns
(GitHub tokens, OpenAI/Anthropic API keys, AWS keys, Slack tokens, etc.)
in URLs. This protects against accidental credential leakage via query
parameters, path segments, and encoded URL content.

Closes #308

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 13, 2026 02:27
@Mossaka Mossaka enabled auto-merge (squash) March 13, 2026 02:27
@github-actions
Copy link
Contributor

✅ Coverage Check Passed

Overall Coverage

Metric Base PR Delta
Lines 84.58% 84.77% 📈 +0.19%
Statements 84.53% 84.72% 📈 +0.19%
Functions 85.28% 85.40% 📈 +0.12%
Branches 77.45% 77.47% 📈 +0.02%
📁 Per-file Coverage Changes (3 files)
File Lines (Before → After) Statements (Before → After)
src/cli.ts 56.0% → 55.7% (-0.22%) 56.5% → 56.2% (-0.23%)
src/squid-config.ts 99.4% → 99.4% (+0.03%) 99.4% → 99.4% (+0.03%)
src/docker-manager.ts 87.0% → 87.6% (+0.51%) 86.4% → 86.9% (+0.49%)
✨ New Files (1 files)
  • src/dlp.ts: 100.0% lines

Coverage comparison generated by scripts/ci/compare-coverage.ts

@github-actions
Copy link
Contributor

Smoke Test Results

Last 2 merged PRs:

Test Result
GitHub MCP
Playwright (github.com title)
File write
Bash verify

Overall: PASS

💥 [THE END] — Illustrated by Smoke Claude for issue #1288

@github-actions
Copy link
Contributor

PRs (last merged):
feat(cli): add --ruleset-file for YAML domain rule configuration
feat(cli): add --enable-dind flag to opt-in to Docker socket access
Tests: GitHub MCP ✅ | safeinputs-gh ✅ | Playwright ✅
Tests: Tavily ❌ | File write+cat ✅ | Discussion ✅
Tests: Build ✅
Overall: FAIL

🔮 The oracle has spoken through Smoke Codex for issue #1288

@github-actions
Copy link
Contributor

Smoke test results for @Mossaka:

Overall: PASS

📰 BREAKING: Report filed by Smoke Copilot for issue #1288

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in Data Loss Prevention (DLP) capability to the firewall by generating Squid url_regex ACL rules that deny requests containing credential-like patterns in URLs, wired through a new CLI flag and config plumbing.

Changes:

  • Add --enable-dlp CLI flag and plumb enableDlp through WrapperConfigSquidConfig → Squid config generation.
  • Introduce src/dlp.ts with built-in credential regex patterns, a scanner helper, and Squid ACL/rule generation.
  • Add unit/integration tests for DLP pattern matching and Squid config integration.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/types.ts Adds enableDlp config fields and documents intended behavior.
src/cli.ts Adds --enable-dlp flag and logs when DLP is enabled; passes into WrapperConfig.
src/docker-manager.ts Passes enableDlp into generateSquidConfig() when writing squid.conf.
src/squid-config.ts Injects generated DLP ACLs and http_access deny dlp_blocked into Squid config output.
src/squid-config.test.ts Adds integration tests asserting presence/placement of DLP rules in generated config.
src/dlp.ts Defines credential patterns and functions to scan strings and generate Squid ACL/rules.
src/dlp.test.ts Adds unit tests for pattern matching and Squid ACL generation.
Comments suppressed due to low confidence (5)

src/dlp.test.ts:58

  • This test includes a fine‑grained GitHub PAT-shaped string (github_pat_...) as a contiguous literal, which can trigger secret scanning / push protection. Generate it dynamically (split/concatenate) so the full token format never appears verbatim in the repo.
    it('should detect GitHub fine-grained PAT (github_pat_)', () => {
      const matches = scanForCredentials(
        'https://api.example.com/?key=github_pat_1234567890abcdefghijkl_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456'
      );
      expect(matches).toContain('GitHub Fine-Grained PAT');

src/dlp.test.ts:44

  • This test includes a GitHub token-shaped string (ghs_ + 36 chars) as a contiguous literal, which can trigger secret scanning / push protection. Build the token dynamically so the full pattern is not present verbatim in the file.
    it('should detect GitHub App installation token (ghs_)', () => {
      const matches = scanForCredentials(
        'https://api.example.com/?key=ghs_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghij'
      );
      expect(matches).toContain('GitHub App Installation Token');

src/dlp.test.ts:51

  • This test includes a GitHub token-shaped string (ghu_ + 36 chars) as a contiguous literal, which can trigger secret scanning / push protection. Build the token dynamically so the full pattern is not present verbatim in the file.
    it('should detect GitHub App user-to-server token (ghu_)', () => {
      const matches = scanForCredentials(
        'https://api.example.com/?key=ghu_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghij'
      );
      expect(matches).toContain('GitHub App User-to-Server Token');

src/dlp.test.ts:90

  • This test includes an AWS Access Key ID-shaped string (AKIA + 16 chars) as a contiguous literal, which is frequently flagged by secret scanning / push protection. Construct it dynamically so the full key pattern is not present verbatim in the source.
    // AWS
    it('should detect AWS access key ID (AKIA)', () => {
      const matches = scanForCredentials(
        'https://api.example.com/?key=AKIAIOSFODNN7EXAMPLE'
      );
      expect(matches).toContain('AWS Access Key ID');

src/dlp.test.ts:30

  • This test includes a GitHub token-shaped string (ghp_ + 36 chars) as a contiguous literal. GitHub secret scanning / push protection commonly blocks commits containing these formats even when they are fake. Generate the token at runtime (e.g., via concatenation/repeat) so the full token pattern never appears verbatim in the repository.
    it('should detect GitHub personal access token (ghp_)', () => {
      const matches = scanForCredentials(
        'https://api.example.com/data?token=ghp_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghij'
      );
      expect(matches).toContain('GitHub Personal Access Token (classic)');

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +27 to +56
const matches = scanForCredentials(
'https://api.example.com/data?token=ghp_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghij'
);
expect(matches).toContain('GitHub Personal Access Token (classic)');
});

it('should detect GitHub OAuth token (gho_)', () => {
const matches = scanForCredentials(
'https://api.example.com/gho_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghij/resource'
);
expect(matches).toContain('GitHub OAuth Access Token');
});

it('should detect GitHub App installation token (ghs_)', () => {
const matches = scanForCredentials(
'https://api.example.com/?key=ghs_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghij'
);
expect(matches).toContain('GitHub App Installation Token');
});

it('should detect GitHub App user-to-server token (ghu_)', () => {
const matches = scanForCredentials(
'https://api.example.com/?key=ghu_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghij'
);
expect(matches).toContain('GitHub App User-to-Server Token');
});

it('should detect GitHub fine-grained PAT (github_pat_)', () => {
const matches = scanForCredentials(
'https://api.example.com/?key=github_pat_1234567890abcdefghijkl_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456'
if (enableDlp) {
const dlp = generateDlpSquidConfig();
dlpAclSection = '\n' + dlp.aclLines.join('\n') + '\n';
dlpAccessSection = '\n' + dlp.accessRules.join('\n') + '\n';
Comment on lines +68 to +73
// OpenAI
{
name: 'OpenAI API Key',
description: 'OpenAI API key (sk-)',
regex: 'sk-[a-zA-Z0-9]{20}T3BlbkFJ[a-zA-Z0-9]{20}',
},
Comment on lines +568 to +578
* Enable Data Loss Prevention (DLP) scanning
*
* When true, Squid proxy will block outgoing requests that contain
* credential-like patterns (API keys, tokens, secrets) in URLs.
* This protects against accidental credential exfiltration via
* query parameters, path segments, or encoded URL content.
*
* Detected patterns include: GitHub tokens (ghp_, gho_, ghs_, ghu_,
* github_pat_), OpenAI keys (sk-), Anthropic keys (sk-ant-),
* AWS access keys (AKIA), Google API keys (AIza), Slack tokens,
* and generic credential patterns.
@github-actions
Copy link
Contributor

Chroot Version Comparison Results

Runtime Host Version Chroot Version Match?
Python Python 3.12.12 Python 3.12.3 ❌ NO
Node.js v24.14.0 v20.20.0 ❌ NO
Go go1.22.12 go1.22.12 ✅ YES

Overall: ❌ Not all tests passed — Python and Node.js versions differ between host and chroot environments.

Tested by Smoke Chroot for issue #1288

@github-actions
Copy link
Contributor

🏗️ Build Test Suite Results

Ecosystem Project Build/Install Tests Status
Bun elysia 1/1 passed ✅ PASS
Bun hono 1/1 passed ✅ PASS
C++ fmt N/A ✅ PASS
C++ json N/A ✅ PASS
Deno oak N/A 1/1 passed ✅ PASS
Deno std N/A 1/1 passed ✅ PASS
.NET hello-world N/A ✅ PASS
.NET json-parse N/A ✅ PASS
Go color passed ✅ PASS
Go env passed ✅ PASS
Go uuid passed ✅ PASS
Java gson 1/1 passed ✅ PASS
Java caffeine 1/1 passed ✅ PASS
Node.js clsx passed ✅ PASS
Node.js execa passed ✅ PASS
Node.js p-limit passed ✅ PASS
Rust fd 1/1 passed ✅ PASS
Rust zoxide 1/1 passed ✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Note: Java required using the Squid proxy IP (172.30.0.10) directly instead of the squid-proxy hostname (which is not DNS-resolvable outside the Docker network), and a custom Maven local repository path (/tmp/gh-aw/agent/m2-repo) since ~/.m2/repository was not writable by the runner user.

Generated by Build Test Suite for issue #1288 ·

@Mossaka Mossaka merged commit 9c69ea6 into main Mar 13, 2026
62 checks passed
@Mossaka Mossaka deleted the feat/093-content-inspection-dlp branch March 13, 2026 02:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[plan] add content inspection for sensitive data patterns

2 participants