Skip to content

Conversation

@hannesrudolph
Copy link
Collaborator

@hannesrudolph hannesrudolph commented Feb 7, 2026

Summary

Adds 5 missing Unicode character mappings to NORMALIZATION_MAPS.TYPOGRAPHIC in text-normalization.ts so that apply_diff can match files containing these characters.

Problem

When using apply_diff on a Markdown file containing Unicode typographic characters, the tool reported low similarity scores (73-97%) even when the search content appeared visually identical to the file content. This prevented any edits from being applied.

Repro Steps

  1. Create a Markdown file containing a middle dot character: > **Status:** 17 of 36 providers migrated · 19 remaining
  2. Attempt to use apply_diff with a search block matching that line
  3. The tool reports ~73% similarity and fails to match

The root cause is that the LLM generates ASCII equivalents (- for ·, -> for , etc.) but normalizeString() didn't know how to map the file's Unicode characters to the same ASCII output.

Changes

src/utils/text-normalization.ts — Added 5 entries to NORMALIZATION_MAPS.TYPOGRAPHIC:

Character Codepoint Maps To Name
· U+00B7 - Middle dot
U+2192 -> Right arrow
U+2190 <- Left arrow
U+2194 <-> Left-right arrow
(invisible) U+FE0F (stripped) Variation selector

src/utils/__tests__/text-normalization.spec.ts — Added 2 test cases covering all new character mappings.

Testing

cd src && npx vitest run utils/__tests__/text-normalization.spec.ts

All 19 tests pass.


Important

Adds missing Unicode character mappings to NORMALIZATION_MAPS.TYPOGRAPHIC in text-normalization.ts and updates tests to improve text normalization accuracy.

  • Behavior:
    • Adds 5 Unicode character mappings to NORMALIZATION_MAPS.TYPOGRAPHIC in text-normalization.ts for better text normalization.
    • Characters include middle dot, right arrow, left arrow, left-right arrow, and variation selector.
  • Testing:
    • Adds 2 test cases in text-normalization.spec.ts to cover new character mappings.
    • All tests pass successfully.

This description was created by Ellipsis for a6c46f7. You can customize this summary. It will automatically update as commits are pushed.

Add middle dot (U+00B7), arrows (U+2192, U+2190, U+2194), and
variation selector (U+FE0F) to NORMALIZATION_MAPS.TYPOGRAPHIC so
apply_diff can match files containing these characters.

Includes test coverage for all new entries.
@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. bug Something isn't working labels Feb 7, 2026
@roomote
Copy link
Contributor

roomote bot commented Feb 7, 2026

Rooviewer Clock   See task

Reviewed the 5 new Unicode character mappings in NORMALIZATION_MAPS.TYPOGRAPHIC and the 2 new test cases. No issues found -- the changes are correct, well-scoped, and adequately tested. All 19 tests pass.

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant