Add encoding discipline rule to operational-constraints#256
Open
Add encoding discipline rule to operational-constraints#256
Conversation
Templates that post comment / reply / description bodies via gh or az rest are vulnerable to silent character corruption (mojibake) on Windows PowerShell 5.x. Adds Rule 10 with the temp-file + UTF-8-without-BOM pattern, the WriteAllText recipe for PowerShell 5.x, an explicit warning against round-tripping through gh pr view --jq | Out-File, and a verify-after-posting step. Fixes #255 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the cross-cutting operational-constraints guardrail protocol to add an explicit “encoding discipline” rule aimed at preventing silent mojibake when templates post externally visible text (e.g., via gh / az)—especially under Windows PowerShell 5.x.
Changes:
- Add Rule 10: Encoding Discipline for External Posts, requiring UTF-8 without BOM for externally posted bodies.
- Document recommended shell-specific patterns (bash/zsh, PowerShell 7+, Windows PowerShell 5.x) and warn against unsafe round-trips.
…xample Out-File defaults to UTF-16LE in PowerShell 5.1, not Windows-1252 (Set-Content is the one that defaults to ANSI). Reworded to give per-cmdlet defaults so the rule is authoritative. Replaced the collapsed-onto-one-line heredoc pseudo-example with a properly-terminated multi-line block, switched the filename to body.md (since gh pr edit --body-file consumes Markdown), and added a note that body.json is appropriate only for JSON-consuming APIs like az rest. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Bare @body.json risks being parsed as a splat token in PowerShell. Quoting it is harmless in bash and required in PowerShell. Addresses PR #256 review feedback. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #255.
Adds a new rule (Rule 10: Encoding Discipline for External Posts) to
protocols/guardrails/operational-constraints.md. Templates that post comment, reply, description, or release-note bodies viagh api/gh pr edit/az restare vulnerable to silent character corruption (mojibake) on Windows PowerShell 5.x when the body contains non-ASCII characters (em-dashes, smart quotes, accented names, currency symbols, non-Latin scripts).This was observed in real use while editing PR #254's description, where em-dashes (
—) appeared in the GitHub UI as╫ô├ç├╣and apostrophes (') as''.What the new rule says
Out-FileorSet-Contentfor non-ASCII bodies (Out-Filedefaults to Windows-1252;Out-File -Encoding utf8writes a BOM). Use[System.IO.File]::WriteAllText($path, $content, [System.Text.UTF8Encoding]::new($false)).gh pr view --jq … | Out-Filefor editing on Windows PowerShell 5.x — the pipe decodes UTF-8 as the console codepage and re-encodes, producing classic UTF-8 → CP1252 → UTF-8 mojibake.Why operational-constraints
operational-constraintsis included by all consuming templates (applicable_to: all), so this single edit coversrespond-to-pr-comments,review-pull-request, and any future template that posts external text — without per-template churn or new manifest entries.Validation
python tests/validate-manifest.pypasses.