Fix corrupted regex patterns in secret key detection by francose · Pull Request #63 · Comcast/xGitGuard

francose · 2026-05-10T16:07:00Z

Several regex patterns in keys_extractor() have been non-functional since the initial commit. The curly brace quantifiers got mangled during copy-paste (likely from a rendered source like a PDF or webpage) — {32} became f32g, escaped dots \. became n., and escaped dollar signs \$ became n$.

This means Google YouTube OAuth IDs, Amazon MWS tokens, and PayPal/Braintree access tokens were never being detected by the key extractor, regardless of how many scans were run.

What was broken:

[0-9]+-[0-9A-Za-z_]f32gn.appsn.googleusercontentn.com — matches nothing
access_tokenn$productionn$[0-9a-z]f16gn$[0-9a-f]f32g — matches nothing
amznn.mwsn.[0-9a-f]f8g-[0-9a-f]f4g-... — matches nothing

What it should be:

[0-9]+-[0-9A-Za-z_]{32}\.apps\.googleusercontent\.com
amzn\.mws\.[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}
access_token\$production\$[0-9a-z]{16}\$[0-9a-f]{32}

Also fixed the label swap — "PayPal" was labeled on what is actually the Amazon MWS pattern (amzn.mws.UUID) and vice versa. Corrected "PayPal" to "PayPal Braintree" and "Amazon MWS" to match the actual token format. Fixed "Slack Webook" typo.

All patterns are now raw strings to prevent future escape issues.

Tested against known token formats — all six previously-broken patterns now correctly match real secrets.

Several secret detection patterns in keys_extractor() were corrupted since the initial commit - curly braces {32} rendered as f32g, escaped dots \. rendered as n, and escaped dollars \$ rendered as n$. This caused Google YouTube OAuth, Amazon MWS, and PayPal/Braintree token patterns to never match any real secrets. Also fixed: "PayPal" label was actually describing the Amazon MWS token format (amzn.mws.*), and "Amazon MWS" had the PayPal/Braintree access_token$production$ format. Labels are now correct. Fixed "Slack Webook" typo to "Slack Webhook". Converted affected patterns to raw strings to prevent future escape corruption.

CLAassistant · 2026-05-10T16:07:08Z

All committers have signed the CLA.

Copilot

Pull request overview

This PR updates keys_extractor() secret-detection regex patterns that were previously corrupted (e.g., mangled quantifiers and escaped characters), aiming to restore detection for several token formats (Google OAuth-related values, Amazon MWS tokens, and PayPal/Braintree access tokens) and correct a Slack webhook label typo.

Changes:

Replaced corrupted Google OAuth/YouTube OAuth regex patterns with corrected patterns, using raw string literals for safer escaping.
Fixed Amazon MWS and PayPal/Braintree token regex patterns and adjusted labels to match the actual formats.
Corrected the "Slack Webook" label typo to "Slack Webhook".

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

francose · 2026-05-10T17:58:38Z

+        "Google OAuth Secret": r"[0-9a-zA-Z\-_]{24}",
+        "Google OAuth Auth Code": r"4/[0-9A-Za-z\-_]+",
+        "Google OAuth Refresh Token": r"1/[0-9A-Za-z\-_]{43}|1/[0-9A-Za-z\-_]{64}",
+        "Google OAuth Access Token": r"ya29\.[0-9A-Za-z\-_]+",
+        "Google API Key": r"AIza[0-9A-Za-z\-_]{35}",


Acknowledged — these patterns were completely non-functional before (corrupted quantifiers like f32g instead of {32}). This PR makes them syntactically correct. The remove_url_from_keys() stripping issue is pre-existing and orthogonal — will open a follow-up for that.

francose · 2026-05-10T17:58:47Z

+        "Google YouTube OAuth ID Gmail, GCloud": r"[0-9]+-[0-9A-Za-z_]{32}\.apps\.googleusercontent\.com",
+        "Amazon MWS": r"amzn\.mws\.[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}",
+        "PayPal Braintree": r"access_token\$production\$[0-9a-z]{16}\$[0-9a-f]{32}",


Same as above — corrected the regex syntax. The dot/underscore stripping by the sanitizer is a separate pre-existing issue. Also converted this to a raw string for consistency in the follow-up commit.

francose · 2026-05-10T17:58:54Z

        "AWS": "(?:.*awsSecretKey|.*aws_secret|.*api-key|.*aws_account_secret).*"
        "(?=.*[A-Z])(?<![A-Za-z0-9/+=])[A-Za-z0-9/+=]{40}(?![A-Za-z0-9/+=])",
-        "Slack Webook": "T[a-zA-Z0-9_]{8}/B[a-zA-Z0-9_]{8}/[a-zA-Z0-9_]{24}",
+        "Slack Webhook": "T[a-zA-Z0-9_]{8}/B[a-zA-Z0-9_]{8}/[a-zA-Z0-9_]{24}",


Converted to raw string in the follow-up commit for consistency. The slash stripping is part of the same sanitizer issue — will address in a separate PR.

francose · 2026-05-10T16:30:06Z

Re: Copilot's review about remove_url_from_keys() stripping ., _, / before keys_extractor() runs —

Valid point. The sanitizer does strip characters that some of these patterns depend on (dots in ya29., amzn.mws., googleusercontent.com, slashes in webhooks, underscores in access_token).

However, this is a pre-existing issue that's separate from the regex corruption. Before this fix, these patterns used mangled quantifiers like f32g instead of {32} and n. instead of \. — they matched literally nothing regardless of preprocessing. This PR makes the regex syntax correct so the patterns are at least valid.

The sanitizer stripping issue existed before and should be addressed separately — either by running keys_extractor() on the raw content (after URL/email removal but before special char stripping), or by splitting the special_chars list into "safe to strip for credential detection" vs "not safe."

Happy to open a follow-up issue for that if maintainers agree on the approach.

Also converting the Slack Webhook pattern to a raw string for consistency as suggested.

Copilot AI review requested due to automatic review settings May 10, 2026 16:07

Copilot started reviewing on behalf of francose May 10, 2026 16:07 View session

Copilot AI reviewed May 10, 2026

View reviewed changes

Use raw string for Slack Webhook pattern for consistency

14d9e5c

francose mentioned this pull request May 10, 2026

remove_url_from_keys() strips characters needed by secret detection patterns #64

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix corrupted regex patterns in secret key detection#63

Fix corrupted regex patterns in secret key detection#63
francose wants to merge 2 commits into
Comcast:mainfrom
francose:fix/corrupted-secret-regexes

francose commented May 10, 2026

Uh oh!

CLAassistant commented May 10, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

francose May 10, 2026

Uh oh!

francose May 10, 2026

Uh oh!

francose May 10, 2026

Uh oh!

francose commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

francose commented May 10, 2026

Uh oh!

CLAassistant commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

francose May 10, 2026

Choose a reason for hiding this comment

Uh oh!

francose May 10, 2026

Choose a reason for hiding this comment

Uh oh!

francose May 10, 2026

Choose a reason for hiding this comment

Uh oh!

francose commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented May 10, 2026 •

edited

Loading