feat(ui): quieter plan diffs on prose edits#603
Merged
backnotprop merged 3 commits intobacknotprop:mainfrom Apr 24, 2026
Merged
feat(ui): quieter plan diffs on prose edits#603backnotprop merged 3 commits intobacknotprop:mainfrom
backnotprop merged 3 commits intobacknotprop:mainfrom
Conversation
Before word-diffing, replace each balanced `**…**`, `__…__`, `~~…~~`, `*…*`, `_…_` (and triples `***…***` / `___…___`) with a unique word-char sentinel — same pattern as the existing code-span / link atomization passes. Identical phrases pair as unchanged; different phrases produce a single remove+add. Fixes the "preliminary analysis" → "final analysis" demo case (⑯), which previously orphaned the closing `**` into the unchanged tail and rendered as literal asterisks. Now renders as one clean bold-struck → bold-green swap. Pair matching uses CommonMark-ish flanking rules so stray `2**3` or intraword `my__var` / `snake_case` stay literal. Longest-first ordering prevents single delimiters from eating the inside of a double-delim pair.
After `diffWordsWithSpace` and sentinel restoration, merge dirty runs of ≥2 change sites separated only by thin unchanged tokens (whitespace, commas, periods, semicolons, colons, dashes, quotes) into a single phrase-level swap. Parens and brackets are excluded so inline links and bracketed content stay as hard boundaries. Turns alternating red/green word-noise (e.g. paragraph reworks with multiple adjacent word swaps) into a readable before/after. Also rescues the atomization edge case where wrapping a previously-plain phrase in emphasis (`foo bar baz` → `foo **bar baz**`) would otherwise surface as fragmented literal delimiters inside colored tags. Single-site dirty runs pass through unchanged so isolated word swaps keep word-level highlighting.
Hyphens between word chars (`ninety-five`, `64-byte`, `state-of-the-art`) are semantic compound words, not two tokens. `diffWordsWithSpace` splits on word boundaries, so without this pass `ninety-five` → `ninety-nine` fragments into an unchanged `ninety-` prefix and a swapped `five`/`nine` suffix — a visually noisy partial-word diff. Added a sentinel pass that replaces infix hyphens with a word-char marker before diffing and restores them afterwards. Runs after the code/link/emphasis passes so hyphens inside those constructs stay hidden. Unlike the other sentinels this one uses a fixed marker — all hyphens restore to the same character, so uniqueness isn't needed. Leading/trailing dashes and em-dash-like separators (dash with space on one side) are not substituted; only true compound infixes.
b004737 to
8366e7d
Compare
Owner
|
Ok great! Yea I wasnt totally sure about the original approach, yours is looking better, Ill get a review in |
Owner
|
nice work |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I appreciate you adding #565 for #560 which has made finding changes easier in my plans - but I wanted to make reading my diffs even easier 😀
Plan diffs use
diffWordsWithSpace, which breaks on word boundaries. For most edits that's the right level of granularity, but it struggles with three specific shapes: markdown emphasis delimiters, adjacent word-level swaps, and hyphenated compounds. The results are technically correct but hard to read at a glance. This PR layers three passes on top of the existing engine to fix them.The three cases
Bold phrase swaps (known issue ⑯). For
**preliminary analysis**→**final analysis**, the opening**attaches to the first word-run while the closing**slides into the unchanged tail. The result has no balanced**…**pair outside the<del>/<ins>wrappers, soInlineMarkdown's bold regex misses it and the asterisks render literally.Adjacent word swaps. When a rewritten sentence changes several nearby words, each swap is highlighted independently. Output alternates red and green between thin spaces — accurate, but the eye can't pick out the shape of the edit.
Hyphenated compounds.
ninety-five→ninety-ninediffs as an unchangedninety-prefix and a swappedfive/ninesuffix. The hyphen ends up on the wrong side of the change and the compound reads as two separate tokens.What changed
9e2eace— balanced emphasis pairs. Before diffing, replace each balanced**…**,__…__,~~…~~,*…*,_…_(and the triple forms) with a unique word-char sentinel. Restore after. Same pattern used by the existing code-span and link passes. Pair matching follows CommonMark's flanking rules so stray asterisks like2**3aren't captured.f455ac6— coalesce adjacent change sites. After the diff returns, walk the token stream and merge any run of two or more change sites separated only by "thin" unchanged tokens (whitespace,, . ; : — – " '). Isolated swaps pass through, which keeps word-level highlighting for the common case. Parens and brackets remain hard boundaries so inline links aren't absorbed into coalesced runs.b004737— hyphen atomization. A final sentinel pass replaces infix hyphens between word chars with a fixed marker. Leading and trailing dashes, and em-dash-like separators (a dash with whitespace on one side), are left alone.Before and after
The clearest demonstration is case ③ — a paragraph with many adjacent word-level changes. Before, the output interleaves red and green at the word level. After, each side reads as one coherent phrase:
Case ① — scattered edits, including the two-word "load balancer" → "service mesh" swap and the
ninety-five→ninety-ninecompound:Case ② — bold phrases (a quieter improvement; included for completeness):
Case ⑯ — the previously-known bold-phrase limitation, now resolved. Before: literal
**asterisks leak out andanalysisloses its weight. After: a clean bold-struck → bold-green swap:Approaches considered an discarded
Rather than inventing my own I looked at existing algorithms first and Google's diff-match-patch came up as an alternative: a 45kb dependency using Myers diff with semantic cleanup, and well-tested. I considered it, but it's character-level by default, which would fragment
ninety-five→ninety-ninemore than the current approach does, plus the library's documented word-mode is a DIY tokeniser, so every token-shape decision in this PR would still need to be made after the swap.Test plan
All tested locally and working.
bun test packages/ui/utils/planDiffEngine.test.tspasses. Covers each atomization pass, the coalescing rules, and anti-regressions for isolated swaps and hard boundaries.packages/editor/demoPlanDiffDemo.ts:244renders as a single bold-struck → bold-green swap.PlanCleanDiffViewcontinues to work — block-level, so it should be unaffected, but worth a manual check.