Skip to content

Add encoding discipline rule to operational-constraints#256

Open
abeltrano wants to merge 3 commits intomainfrom
fix-shell-encoding-mojibake
Open

Add encoding discipline rule to operational-constraints#256
abeltrano wants to merge 3 commits intomainfrom
fix-shell-encoding-mojibake

Conversation

@abeltrano
Copy link
Copy Markdown
Collaborator

Fixes #255.

Adds a new rule (Rule 10: Encoding Discipline for External Posts) to protocols/guardrails/operational-constraints.md. Templates that post comment, reply, description, or release-note bodies via gh api / gh pr edit / az rest are vulnerable to silent character corruption (mojibake) on Windows PowerShell 5.x when the body contains non-ASCII characters (em-dashes, smart quotes, accented names, currency symbols, non-Latin scripts).

This was observed in real use while editing PR #254's description, where em-dashes () appeared in the GitHub UI as דÇù and apostrophes (') as ''.

What the new rule says

  • Always pass bodies via a temp file (the temp-file pattern is already required for ADO POSTs to avoid JSON escaping pitfalls; reuse it for encoding safety too).
  • bash / zsh / PowerShell 7+: defaults are fine.
  • Windows PowerShell 5.x: do NOT use Out-File or Set-Content for non-ASCII bodies (Out-File defaults to Windows-1252; Out-File -Encoding utf8 writes a BOM). Use [System.IO.File]::WriteAllText($path, $content, [System.Text.UTF8Encoding]::new($false)).
  • Never round-trip existing posted content through gh pr view --jq … | Out-File for editing on Windows PowerShell 5.x — the pipe decodes UTF-8 as the console codepage and re-encodes, producing classic UTF-8 → CP1252 → UTF-8 mojibake.
  • Verify after posting when the body contained non-ASCII characters; repost if corruption is detected.

Why operational-constraints

operational-constraints is included by all consuming templates (applicable_to: all), so this single edit covers respond-to-pr-comments, review-pull-request, and any future template that posts external text — without per-template churn or new manifest entries.

Validation

  • python tests/validate-manifest.py passes.
  • No protocol or template frontmatter changes; no manifest edits required.

Templates that post comment / reply / description bodies via gh or az rest are vulnerable to silent character corruption (mojibake) on Windows PowerShell 5.x. Adds Rule 10 with the temp-file + UTF-8-without-BOM pattern, the WriteAllText recipe for PowerShell 5.x, an explicit warning against round-tripping through gh pr view --jq | Out-File, and a verify-after-posting step.

Fixes #255

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the cross-cutting operational-constraints guardrail protocol to add an explicit “encoding discipline” rule aimed at preventing silent mojibake when templates post externally visible text (e.g., via gh / az)—especially under Windows PowerShell 5.x.

Changes:

  • Add Rule 10: Encoding Discipline for External Posts, requiring UTF-8 without BOM for externally posted bodies.
  • Document recommended shell-specific patterns (bash/zsh, PowerShell 7+, Windows PowerShell 5.x) and warn against unsafe round-trips.

Comment thread protocols/guardrails/operational-constraints.md Outdated
Comment thread protocols/guardrails/operational-constraints.md Outdated
…xample

Out-File defaults to UTF-16LE in PowerShell 5.1, not Windows-1252 (Set-Content is the one that defaults to ANSI). Reworded to give per-cmdlet defaults so the rule is authoritative.

Replaced the collapsed-onto-one-line heredoc pseudo-example with a properly-terminated multi-line block, switched the filename to body.md (since gh pr edit --body-file consumes Markdown), and added a note that body.json is appropriate only for JSON-consuming APIs like az rest.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comment thread protocols/guardrails/operational-constraints.md Outdated
Bare @body.json risks being parsed as a splat token in PowerShell.

Quoting it is harmless in bash and required in PowerShell.

Addresses PR #256 review feedback.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PR comment bodies posted from Windows PowerShell 5.x can be mojibake-corrupted

2 participants