From f43b2adc20920685aa45ae3c87e5469bd3ece133 Mon Sep 17 00:00:00 2001 From: Andrew Beltrano Date: Mon, 4 May 2026 14:53:58 -0600 Subject: [PATCH 1/3] Add encoding discipline rule to operational-constraints Templates that post comment / reply / description bodies via gh or az rest are vulnerable to silent character corruption (mojibake) on Windows PowerShell 5.x. Adds Rule 10 with the temp-file + UTF-8-without-BOM pattern, the WriteAllText recipe for PowerShell 5.x, an explicit warning against round-tripping through gh pr view --jq | Out-File, and a verify-after-posting step. Fixes #255 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../guardrails/operational-constraints.md | 40 +++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/protocols/guardrails/operational-constraints.md b/protocols/guardrails/operational-constraints.md index fda60f8..4edc388 100644 --- a/protocols/guardrails/operational-constraints.md +++ b/protocols/guardrails/operational-constraints.md @@ -145,3 +145,43 @@ Every analysis MUST include a coverage statement: - **Excluded**: - **Limitations**: ``` + +### 10. Encoding Discipline for External Posts + +When drafting comment, reply, description, or release-note bodies that +will be posted to an external API (e.g., `gh api`, `gh pr edit`, +`gh pr comment`, `az rest`), the body **MUST** reach the API as +**UTF-8 without a BOM**. Non-ASCII characters (em-dashes, smart quotes, +accented names, currency symbols, non-Latin scripts) corrupt silently +when the shell uses a non-UTF-8 codepage. + +- **Always pass bodies via a temp file**, not as inline command-line + strings. (The temp-file pattern is already required for ADO POSTs to + avoid JSON escaping pitfalls; reuse it everywhere for the same + reason and for encoding safety.) +- **bash / zsh / PowerShell 7+**: default UTF-8 is fine. Write with + `cat > body.json <<'EOF' … EOF` (bash/zsh) or `Set-Content -Encoding + utf8NoBOM` (PowerShell 7+). +- **Windows PowerShell 5.x** (the default on Windows 10 / 11 without + PowerShell 7+ installed): do NOT use `Out-File` or `Set-Content` + for body files containing non-ASCII characters. `Out-File` defaults + to Windows-1252; `Out-File -Encoding utf8` writes UTF-8 **with a + BOM**. Use: + + ```powershell + [System.IO.File]::WriteAllText($path, $content, + [System.Text.UTF8Encoding]::new($false)) + ``` + +- **Never round-trip existing posted content** through + `gh pr view --jq … | Out-File` (or `Set-Content`) for editing on + Windows PowerShell 5.x. The pipe decodes the UTF-8 byte stream from + `gh` as the console codepage, then re-encodes it — producing + classic UTF-8 → CP1252 → UTF-8 mojibake (e.g., `—` becomes + `╫ô├ç├╣`). Write the new content from scratch in clean UTF-8. + +- **Verify after posting** when the body contained non-ASCII + characters — fetch the posted artifact (e.g., `gh pr view`, + `gh api`) and visually confirm em-dashes and accented characters + rendered correctly. If corruption is detected, repost using the + encoding-safe pattern above. From 8b3fcf7dd7e0bcde56276cbbd4e51da3125ab245 Mon Sep 17 00:00:00 2001 From: Andrew Beltrano Date: Mon, 4 May 2026 15:04:55 -0600 Subject: [PATCH 2/3] Address PR #256 review: correct PS5 cmdlet defaults and fix heredoc example Out-File defaults to UTF-16LE in PowerShell 5.1, not Windows-1252 (Set-Content is the one that defaults to ANSI). Reworded to give per-cmdlet defaults so the rule is authoritative. Replaced the collapsed-onto-one-line heredoc pseudo-example with a properly-terminated multi-line block, switched the filename to body.md (since gh pr edit --body-file consumes Markdown), and added a note that body.json is appropriate only for JSON-consuming APIs like az rest. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../guardrails/operational-constraints.md | 29 +++++++++++++++---- 1 file changed, 23 insertions(+), 6 deletions(-) diff --git a/protocols/guardrails/operational-constraints.md b/protocols/guardrails/operational-constraints.md index 4edc388..50c27e7 100644 --- a/protocols/guardrails/operational-constraints.md +++ b/protocols/guardrails/operational-constraints.md @@ -159,14 +159,31 @@ when the shell uses a non-UTF-8 codepage. strings. (The temp-file pattern is already required for ADO POSTs to avoid JSON escaping pitfalls; reuse it everywhere for the same reason and for encoding safety.) -- **bash / zsh / PowerShell 7+**: default UTF-8 is fine. Write with - `cat > body.json <<'EOF' … EOF` (bash/zsh) or `Set-Content -Encoding - utf8NoBOM` (PowerShell 7+). +- **bash / zsh / PowerShell 7+**: default UTF-8 is fine. Use a + heredoc (bash/zsh): + + ```bash + cat > body.md <<'EOF' + Comment body — em-dashes and accented names like Ångström survive. + EOF + ``` + + Or in PowerShell 7+: + + ```powershell + Set-Content -Encoding utf8NoBOM -Path body.md -Value $content + ``` + + Use `body.md` (or `body.txt`) for Markdown bodies and `body.json` + only when the API actually consumes JSON (e.g., `az rest --body + @body.json`). - **Windows PowerShell 5.x** (the default on Windows 10 / 11 without PowerShell 7+ installed): do NOT use `Out-File` or `Set-Content` - for body files containing non-ASCII characters. `Out-File` defaults - to Windows-1252; `Out-File -Encoding utf8` writes UTF-8 **with a - BOM**. Use: + for body files containing non-ASCII characters. Their defaults are + not UTF-8: `Out-File` defaults to UTF-16LE (with a BOM), + `Set-Content` defaults to the system ANSI codepage (typically + Windows-1252 on en-US), and `Out-File -Encoding utf8` writes UTF-8 + **with a BOM**. Use: ```powershell [System.IO.File]::WriteAllText($path, $content, From f3e396f8a21c2b4478046f540874722664ee1873 Mon Sep 17 00:00:00 2001 From: Andrew Beltrano Date: Mon, 4 May 2026 15:21:39 -0600 Subject: [PATCH 3/3] Quote @body.json in az rest example for PowerShell compatibility Bare @body.json risks being parsed as a splat token in PowerShell. Quoting it is harmless in bash and required in PowerShell. Addresses PR #256 review feedback. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- protocols/guardrails/operational-constraints.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/protocols/guardrails/operational-constraints.md b/protocols/guardrails/operational-constraints.md index 50c27e7..088ba2a 100644 --- a/protocols/guardrails/operational-constraints.md +++ b/protocols/guardrails/operational-constraints.md @@ -176,7 +176,8 @@ when the shell uses a non-UTF-8 codepage. Use `body.md` (or `body.txt`) for Markdown bodies and `body.json` only when the API actually consumes JSON (e.g., `az rest --body - @body.json`). + "@body.json"` — the quotes are required in PowerShell to prevent + `@body.json` from being parsed as a splat token; harmless in bash). - **Windows PowerShell 5.x** (the default on Windows 10 / 11 without PowerShell 7+ installed): do NOT use `Out-File` or `Set-Content` for body files containing non-ASCII characters. Their defaults are