security: Unicode integrity check — detect hidden characters in instruction files#12
security: Unicode integrity check — detect hidden characters in instruction files#12
Conversation
Adds scripts/check-unicode-integrity.py, a CI workflow, and a pre-commit hook to scan all instruction files for hidden Unicode characters that could be used to embed invisible AI instructions (zero-width spaces, bidi overrides, tag characters, C0/C1 controls, and more). This repo distributes instruction files to thousands of projects, making it a high-value target for supply-chain attacks via the Rules File Backdoor technique. The scanner runs on every push/PR and on a daily schedule.
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 11 minutes and 39 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
Summary
.md/.yamlinstruction files that carry invisible AI instructions.scripts/check-unicode-integrity.py: a scanner that reports every suspicious character with filename, line number, and hex code, then exits 1 if any are found..github/workflows/integrity-check.yml: runs on every push/PR to main and on a daily schedule (to catch supply-chain attacks that land between PRs)..pre-commit-config.yaml: enforces the same check before every commit locally.What's detected
What's allowed
Standard whitespace (space, tab, newline, CR) and all standard Unicode text including accented characters and CJK for i18n READMEs are explicitly permitted.
Test plan
python3 scripts/check-unicode-integrity.py .exits 0 on the clean repo.mdfilecheck-unicode-integrity-checklocally afterpre-commit install