fix(math): split multi-char m:r text per character to match Word (SD-2632) by caio-pizzol · Pull Request #2875 · superdoc-dev/superdoc

caio-pizzol · 2026-04-20T20:56:56Z

Word's OMML→MathML converter classifies each character in an m:r run individually — consecutive digits group into one <mn>, each operator character becomes its own <mo>, everything else becomes its own <mi>. SuperDoc was emitting the full run text as a single element, so →∞ came out as <mi>→∞</mi> and lost operator spacing and semantics.

tokenizeMathText does per-char classification with digit grouping (one optional decimal point). convertMathRun returns a DocumentFragment when a run splits, so siblings flow directly into the parent <mrow> with no extra wrapper.
m:fName is the documented exception — multi-letter function names like sin/lim stay whole. convertFunction routes m:r children through a new convertMathRunWhole helper, and collapseFunctionNameBases re-merges the base slot of any structural element Word nested inside m:fName (e.g. m:limLow → <munder>).
MathObjectConverter's return widens from Element|null to Node|null so the fragment return works without casting. All 19 other converters already return Element, which is assignable.
Drops U+221E (∞) from OPERATOR_CHARS — it's a constant per Word, not an operator.

Verified against Word's own OMML2MML.XSL across 15+ input shapes — byte-identical output for bare math, and matching whole-name behavior inside m:fName.

Rejected: narrowing the split to "only when the run contains an operator" (Option B in investigation). It would have fixed the reported →∞ case without cascading, but left SuperDoc rendering <mi>sin</mi> vs Word <mi>s</mi><mi>i</mi><mi>n</mi> for bare function names — and downstream rendering issues (operator spacing inside the run) would keep surfacing. Chose full parity.
Review: the collapseFunctionNameBases pass is the subtlest part — it walks the post-split tree and re-merges <mi> siblings in structural bases. Confirm the BASE_BEARING_ELEMENTS set covers every shape Word nests under m:fName that we've actually seen. Also worth checking that a same-variant requirement is correct (currently only merges when all siblings share the same mathvariant).
Verified: 172/172 unit tests, 79/79 behavior tests, type-check clean. Pre-existing 3 m:groupChr tests updated — their munder.querySelector('mo') descendant-search now picks up split operators inside the base expression, so they were re-scoped to :scope > mo (the group character, which is what they were actually testing).

…2632) Word's OMML2MML.XSL classifies each character in an m:r run individually — digits group into a single <mn> (with one optional decimal point between digits), operator characters each become their own <mo>, and everything else becomes its own <mi>. SuperDoc was emitting the entire run text as one element, so runs like "→∞" or "x+1" rendered as a single <mi>, losing operator spacing and semantics. tokenizeMathText implements the per-character classification. convertMathRun returns a single Element for one-atom runs and a DocumentFragment when multiple atoms are emitted, so siblings flow directly into the parent's <mrow> without an extra wrapper. m:fName is the documented exception — Word keeps multi-letter function names like "sin" or "lim" as one <mi> inside the function-name slot. convertFunction routes m:r children through convertMathRunWhole (no splitting), and a new collapseFunctionNameBases pass re-merges the base slot of any structural MathML element (munder/mover/msub/…) that Word nested inside m:fName — without this, "lim" inside m:limLow would incorrectly split to three <mi>. Also drops U+221E (∞) from OPERATOR_CHARS — it's a mathematical constant per Word's XSL, not an operator. MathObjectConverter's return type widens from Element|null to Node|null so convertMathRun can return a DocumentFragment. All other converters already return Element, which is assignable to Node — no other changes. Verified against real Word-native fixtures: `→∞` in the limit-tests fixture case 1 now renders as <mi>n</mi><mo>→</mo><mi>∞</mi> (matches Word OMML2MML.XSL byte-for-byte), and nested limits keep their function names intact. Ref ECMA-376 §22.1.2.116, Annex L.6.1.13, §22.1.2.58.

linear · 2026-04-20T20:56:59Z

SD-2632 Math: multi-char operator strings render as `<mi>` instead of splitting into `<mo>` / `<mi>`

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9ceec55257

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

codecov-commenter · 2026-04-20T21:03:23Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…pts (SD-2632) Follow-up to /review feedback backed by fresh Word OMML2MML.XSL evidence: - `convertMathRunAsFunctionName` (renamed from `convertMathRunWhole` since "whole" no longer fits) now groups consecutive non-digit / non-operator characters into one <mi> while still splitting digits and operators. Word's XSL for `<m:fName><m:r>log_2</m:r></m:fName>` produces `<mi>log</mi><mo>_</mo><mn>2</mn>` — not `<mi>l</mi><mi>o</mi><mi>g</mi>…`. - `BASE_BEARING_ELEMENTS` gains `mmultiscripts` — Word emits it when an `m:sPre` sits inside `m:fName`; our base-collapse pass needs to know to merge the first-child <mi> run. - CONTRIBUTING.md now documents the widened `Node | null` return type. Tests added: - Direct `tokenizeMathText` edge cases: `.5` / `5.` / `1.2.3` / `2x+1` / consecutive operators / empty / standalone ∞. - m:fName mixed-content: `log_2` stays `<mi>log</mi><mo>_</mo><mn>2</mn>`. - Base collapse inside nested `m:sSub` under `m:fName`. - Base collapse inside nested `m:sPre` (mmultiscripts) under `m:fName`. - Behavior test tightened to pin the full 3-atom sequence for `n→∞`. Disputed during review and deferred with evidence: - Opus claim that standalone `m:limLow` with "lim" base regresses to italic: Word XSL itself splits "lim" per-char in that shape (with or without m:sty=p), so our output matches Word. - Codex claim that Arabic-Indic digits should be `<mn>`: Word XSL also classifies them as `<mi>`, so our behavior matches. - Non-BMP surrogate-pair support: edge case in extreme mathematical alphanumerics; Word XSL itself errored on U+1D465. Separate ticket worth.

Addresses Codex bot review on PR #2875. Astral-plane mathematical alphanumerics (e.g. U+1D465 mathematical italic x, U+1D7D9 mathematical double-struck 1) are UTF-16 surrogate pairs. Walking text by code unit split them into two half-pair <mi> atoms with invalid content. `codePointUnitLength` returns 2 when the current position starts a surrogate pair so tokenizeMathText and tokenizeFunctionNameText step across the full code point.

…s-render-as-mi-instead-of

superdoc-bot bot added review: thorough review: careful and removed review: thorough labels Apr 20, 2026

chatgpt-codex-connector bot reviewed Apr 20, 2026

View reviewed changes

Comment thread packages/layout-engine/painters/dom/src/features/math/converters/math-run.ts Outdated

caio-pizzol and others added 3 commits April 20, 2026 18:18

Merge branch 'main' into caio/sd-2632-math-multi-char-operator-string…

54462b1

…s-render-as-mi-instead-of

caio-pizzol enabled auto-merge April 20, 2026 21:54

caio-pizzol added this pull request to the merge queue Apr 20, 2026

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(math): split multi-char m:r text per character to match Word (SD-2632)#2875

fix(math): split multi-char m:r text per character to match Word (SD-2632)#2875
caio-pizzol wants to merge 4 commits intomainfrom
caio/sd-2632-math-multi-char-operator-strings-render-as-mi-instead-of

caio-pizzol commented Apr 20, 2026

Uh oh!

linear bot commented Apr 20, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

codecov-commenter commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

caio-pizzol commented Apr 20, 2026

Uh oh!

linear bot commented Apr 20, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

codecov-commenter commented Apr 20, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants