Skip to content

security: Unicode integrity check — detect hidden characters in instruction files#12

Open
stuckvgn wants to merge 1 commit intomainfrom
security/unicode-integrity-check
Open

security: Unicode integrity check — detect hidden characters in instruction files#12
stuckvgn wants to merge 1 commit intomainfrom
security/unicode-integrity-check

Conversation

@stuckvgn
Copy link
Copy Markdown
Contributor

Summary

  • This repo distributes instruction files to thousands of projects, making it a high-value target for the Rules File Backdoor attack — hidden Unicode characters (zero-width spaces, directional overrides) embedded in .md/.yaml instruction files that carry invisible AI instructions.
  • Adds scripts/check-unicode-integrity.py: a scanner that reports every suspicious character with filename, line number, and hex code, then exits 1 if any are found.
  • Adds .github/workflows/integrity-check.yml: runs on every push/PR to main and on a daily schedule (to catch supply-chain attacks that land between PRs).
  • Adds .pre-commit-config.yaml: enforces the same check before every commit locally.

What's detected

  • Zero-width spaces (U+200B, U+200C, U+200D, U+FEFF)
  • Bidirectional text overrides (U+202A–U+202E, U+2066–U+2069) — used to visually reverse text
  • Invisible separators (U+2060–U+2064) and soft hyphens (U+00AD)
  • Tag characters (U+E0000–U+E007F) — invisible ASCII lookalikes used in prompt injection
  • C0/C1 control characters (excluding standard tab/newline/CR/space)
  • Private Use Area codepoints (no legitimate use in shared instruction files)
  • Variation selectors (can silently alter glyph rendering)

What's allowed

Standard whitespace (space, tab, newline, CR) and all standard Unicode text including accented characters and CJK for i18n READMEs are explicitly permitted.

Test plan

  • python3 scripts/check-unicode-integrity.py . exits 0 on the clean repo
  • Script exits 1 and reports line number + hex code when U+200B is introduced into any .md file
  • CI workflow triggers on this PR
  • Daily schedule cron entry is present in the workflow YAML
  • Pre-commit hook runs check-unicode-integrity-check locally after pre-commit install

Adds scripts/check-unicode-integrity.py, a CI workflow, and a pre-commit
hook to scan all instruction files for hidden Unicode characters that could
be used to embed invisible AI instructions (zero-width spaces, bidi overrides,
tag characters, C0/C1 controls, and more).

This repo distributes instruction files to thousands of projects, making it
a high-value target for supply-chain attacks via the Rules File Backdoor
technique. The scanner runs on every push/PR and on a daily schedule.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 31, 2026

Warning

Rate limit exceeded

@stuckvgn has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 11 minutes and 39 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 11 minutes and 39 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 04090baa-17e2-4ae4-b95f-28ca3c3ebbb3

📥 Commits

Reviewing files that changed from the base of the PR and between dc85f44 and 6e9e9ed.

📒 Files selected for processing (3)
  • .github/workflows/integrity-check.yml
  • .pre-commit-config.yaml
  • scripts/check-unicode-integrity.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch security/unicode-integrity-check

Comment @coderabbitai help to get the list of available commands and usage tips.

@stuckvgn
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 31, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@stuckvgn
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 31, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant