Skip to content

feat(ui): quieter plan diffs on prose edits#603

Merged
backnotprop merged 3 commits intobacknotprop:mainfrom
pbowyer:diff-improvements
Apr 24, 2026
Merged

feat(ui): quieter plan diffs on prose edits#603
backnotprop merged 3 commits intobacknotprop:mainfrom
pbowyer:diff-improvements

Conversation

@pbowyer
Copy link
Copy Markdown
Contributor

@pbowyer pbowyer commented Apr 23, 2026

I appreciate you adding #565 for #560 which has made finding changes easier in my plans - but I wanted to make reading my diffs even easier 😀

Plan diffs use diffWordsWithSpace, which breaks on word boundaries. For most edits that's the right level of granularity, but it struggles with three specific shapes: markdown emphasis delimiters, adjacent word-level swaps, and hyphenated compounds. The results are technically correct but hard to read at a glance. This PR layers three passes on top of the existing engine to fix them.

The three cases

Bold phrase swaps (known issue ⑯). For **preliminary analysis****final analysis**, the opening ** attaches to the first word-run while the closing ** slides into the unchanged tail. The result has no balanced **…** pair outside the <del>/<ins> wrappers, so InlineMarkdown's bold regex misses it and the asterisks render literally.

Adjacent word swaps. When a rewritten sentence changes several nearby words, each swap is highlighted independently. Output alternates red and green between thin spaces — accurate, but the eye can't pick out the shape of the edit.

Hyphenated compounds. ninety-fiveninety-nine diffs as an unchanged ninety- prefix and a swapped five/nine suffix. The hyphen ends up on the wrong side of the change and the compound reads as two separate tokens.

What changed

  • 9e2eace — balanced emphasis pairs. Before diffing, replace each balanced **…**, __…__, ~~…~~, *…*, _…_ (and the triple forms) with a unique word-char sentinel. Restore after. Same pattern used by the existing code-span and link passes. Pair matching follows CommonMark's flanking rules so stray asterisks like 2**3 aren't captured.
  • f455ac6 — coalesce adjacent change sites. After the diff returns, walk the token stream and merge any run of two or more change sites separated only by "thin" unchanged tokens (whitespace, , . ; : — – " '). Isolated swaps pass through, which keeps word-level highlighting for the common case. Parens and brackets remain hard boundaries so inline links aren't absorbed into coalesced runs.
  • b004737 — hyphen atomization. A final sentinel pass replaces infix hyphens between word chars with a fixed marker. Leading and trailing dashes, and em-dash-like separators (a dash with whitespace on one side), are left alone.

Before and after

The clearest demonstration is case ③ — a paragraph with many adjacent word-level changes. Before, the output interleaves red and green at the word level. After, each side reads as one coherent phrase:

before-03-inline-code after-03-inline-code

Case ① — scattered edits, including the two-word "load balancer" → "service mesh" swap and the ninety-fiveninety-nine compound:

before-01-scattered-edits after-01-scattered-edits

Case ② — bold phrases (a quieter improvement; included for completeness):

before-02-bold-phrases after-02-bold-phrases

Case ⑯ — the previously-known bold-phrase limitation, now resolved. Before: literal ** asterisks leak out and analysis loses its weight. After: a clean bold-struck → bold-green swap:

before-16-bold-phrase after-16-bold-phrase

Approaches considered an discarded

Rather than inventing my own I looked at existing algorithms first and Google's diff-match-patch came up as an alternative: a 45kb dependency using Myers diff with semantic cleanup, and well-tested. I considered it, but it's character-level by default, which would fragment ninety-fiveninety-nine more than the current approach does, plus the library's documented word-mode is a DIY tokeniser, so every token-shape decision in this PR would still need to be made after the swap.

Test plan

All tested locally and working.

  • bun test packages/ui/utils/planDiffEngine.test.ts passes. Covers each atomization pass, the coalescing rules, and anti-regressions for isolated swaps and hard boundaries.
  • The ⑯ demo at packages/editor/demoPlanDiffDemo.ts:244 renders as a single bold-struck → bold-green swap.
  • A paragraph with three adjacent word swaps coalesces into one phrase-level change.
  • An isolated single-word swap still highlights at word level.
  • A swap next to an unchanged inline link splits cleanly at the link boundary.
  • Annotation capture in PlanCleanDiffView continues to work — block-level, so it should be unaffected, but worth a manual check.

pbowyer added 3 commits April 23, 2026 09:30
Before word-diffing, replace each balanced `**…**`, `__…__`, `~~…~~`,
`*…*`, `_…_` (and triples `***…***` / `___…___`) with a unique
word-char sentinel — same pattern as the existing code-span / link
atomization passes. Identical phrases pair as unchanged; different
phrases produce a single remove+add.

Fixes the "preliminary analysis" → "final analysis" demo case (⑯),
which previously orphaned the closing `**` into the unchanged tail and
rendered as literal asterisks. Now renders as one clean bold-struck →
bold-green swap.

Pair matching uses CommonMark-ish flanking rules so stray `2**3` or
intraword `my__var` / `snake_case` stay literal. Longest-first ordering
prevents single delimiters from eating the inside of a double-delim pair.
After `diffWordsWithSpace` and sentinel restoration, merge dirty runs
of ≥2 change sites separated only by thin unchanged tokens (whitespace,
commas, periods, semicolons, colons, dashes, quotes) into a single
phrase-level swap. Parens and brackets are excluded so inline links
and bracketed content stay as hard boundaries.

Turns alternating red/green word-noise (e.g. paragraph reworks with
multiple adjacent word swaps) into a readable before/after. Also
rescues the atomization edge case where wrapping a previously-plain
phrase in emphasis (`foo bar baz` → `foo **bar baz**`) would otherwise
surface as fragmented literal delimiters inside colored tags.

Single-site dirty runs pass through unchanged so isolated word swaps
keep word-level highlighting.
Hyphens between word chars (`ninety-five`, `64-byte`, `state-of-the-art`)
are semantic compound words, not two tokens. `diffWordsWithSpace` splits
on word boundaries, so without this pass `ninety-five` → `ninety-nine`
fragments into an unchanged `ninety-` prefix and a swapped `five`/`nine`
suffix — a visually noisy partial-word diff.

Added a sentinel pass that replaces infix hyphens with a word-char
marker before diffing and restores them afterwards. Runs after the
code/link/emphasis passes so hyphens inside those constructs stay
hidden. Unlike the other sentinels this one uses a fixed marker — all
hyphens restore to the same character, so uniqueness isn't needed.

Leading/trailing dashes and em-dash-like separators (dash with space on
one side) are not substituted; only true compound infixes.
@pbowyer pbowyer force-pushed the diff-improvements branch from b004737 to 8366e7d Compare April 23, 2026 08:33
@backnotprop
Copy link
Copy Markdown
Owner

Ok great! Yea I wasnt totally sure about the original approach, yours is looking better, Ill get a review in

@backnotprop backnotprop merged commit 4d8d3a2 into backnotprop:main Apr 24, 2026
7 checks passed
@backnotprop
Copy link
Copy Markdown
Owner

nice work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants