fix: add missing Unicode chars to text normalization map #11293
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Adds 5 missing Unicode character mappings to
NORMALIZATION_MAPS.TYPOGRAPHICintext-normalization.tsso thatapply_diffcan match files containing these characters.Problem
When using
apply_diffon a Markdown file containing Unicode typographic characters, the tool reported low similarity scores (73-97%) even when the search content appeared visually identical to the file content. This prevented any edits from being applied.Repro Steps
> **Status:** 17 of 36 providers migrated · 19 remainingapply_diffwith a search block matching that lineThe root cause is that the LLM generates ASCII equivalents (
-for·,->for→, etc.) butnormalizeString()didn't know how to map the file's Unicode characters to the same ASCII output.Changes
src/utils/text-normalization.ts— Added 5 entries toNORMALIZATION_MAPS.TYPOGRAPHIC:·-→->←<-↔<->src/utils/__tests__/text-normalization.spec.ts— Added 2 test cases covering all new character mappings.Testing
All 19 tests pass.
Important
Adds missing Unicode character mappings to
NORMALIZATION_MAPS.TYPOGRAPHICintext-normalization.tsand updates tests to improve text normalization accuracy.NORMALIZATION_MAPS.TYPOGRAPHICintext-normalization.tsfor better text normalization.text-normalization.spec.tsto cover new character mappings.This description was created by
for a6c46f7. You can customize this summary. It will automatically update as commits are pushed.