Status: Proposed — deferred until after the next release (a major feature lands first; this is a broad cross-cutting change we don't want to collide with it).
Proposal doc in-repo: docs-src/developer/word-identity-data-hex.md.
Problem
In the reading view, every occurrence of the same term shares an identity token so a status change can restyle all occurrences client-side at once. That identity is currently carried two ways:
- a CSS class
TERM<hex> on each word span, and
- a
data_hex attribute on JS-rendered spans (the same value, duplicated).
The <hex> comes from StringUtils::toClassName() (original-LWT 2011 ¤/hex encoder). Issues:
- Dual identity — same value as both a class and an attribute; lookups split across
.TERM<hex> selectors and data_hex reads.
- Hacky, subtly-broken encoding — iterates per character (
mb_substr) but tests per byte (ord), so the ¤-sentinel/165-threshold unambiguity scheme was never actually realized. PHP 8.5 surfaced it by deprecating ord() on a multi-byte string.
- Fragile extraction — JS
TERM([a-f0-9]+) extractors mis-handle tokens containing g-z/G-Z/¤ and silently fall back to data_hex.
Proposal
Make data_hex the single identity:
- Select via
[data_hex="…"].
- Drop the
TERM class entirely (zero CSS dependencies — purely an index).
toClassName → substr(hash('sha256', $s), 0, 16) (pure [0-9a-f]).
Token stays opaque/recomputable/contained → no API wire-format change, no CSS.escape needed, and the extractor regexes become correct by construction. The ¤/165/mb_ord-vs-ord question disappears.
Why it's safe
- The token is never reversed back to text (backend re-derives it from
WoTextLC).
.TERM has no CSS rules.
- Tokens are computed per render, never stored — no desync risk.
Trade-off: the token is opaque in devtools (accepted).
Scope (post-release)
- PHP token:
StringUtils::toClassName() → hash (keep toHex()).
- PHP emit (5 spans):
TextReadingService ×3, ExpressionService ×2 — drop TERM from class, add data_hex.
- JS emit: remove
TERM${word.hex} push in text_renderer.ts (data_hex already emitted).
- JS selectors (~9):
.TERM${hex} → [data_hex="${hex}"] (word_dom_updates.ts, word_result_init.ts, text_renderer.ts).
- JS extractors (4): read
data_hex (text_reader.ts, text_keyboard.ts, word_actions.ts, text_events.ts).
- Tests: PHP
toClassName assertions (IntegrationTest, TextProcessingTest) + frontend fixtures (tests/frontend/reading/*, tests/frontend/words/*, texts/text_reader.test.ts).
Out of scope: toHex(); table_review_row.php's id="TERM<woId>" (different mechanism).
Status: Proposed — deferred until after the next release (a major feature lands first; this is a broad cross-cutting change we don't want to collide with it).
Proposal doc in-repo:
docs-src/developer/word-identity-data-hex.md.Problem
In the reading view, every occurrence of the same term shares an identity token so a status change can restyle all occurrences client-side at once. That identity is currently carried two ways:
TERM<hex>on each word span, anddata_hexattribute on JS-rendered spans (the same value, duplicated).The
<hex>comes fromStringUtils::toClassName()(original-LWT 2011¤/hex encoder). Issues:.TERM<hex>selectors anddata_hexreads.mb_substr) but tests per byte (ord), so the¤-sentinel/165-threshold unambiguity scheme was never actually realized. PHP 8.5 surfaced it by deprecatingord()on a multi-byte string.TERM([a-f0-9]+)extractors mis-handle tokens containingg-z/G-Z/¤and silently fall back todata_hex.Proposal
Make
data_hexthe single identity:[data_hex="…"].TERMclass entirely (zero CSS dependencies — purely an index).toClassName→substr(hash('sha256', $s), 0, 16)(pure[0-9a-f]).Token stays opaque/recomputable/contained → no API wire-format change, no
CSS.escapeneeded, and the extractor regexes become correct by construction. The¤/165/mb_ord-vs-ordquestion disappears.Why it's safe
WoTextLC)..TERMhas no CSS rules.Trade-off: the token is opaque in devtools (accepted).
Scope (post-release)
StringUtils::toClassName()→ hash (keeptoHex()).TextReadingService×3,ExpressionService×2 — dropTERMfrom class, adddata_hex.TERM${word.hex}push intext_renderer.ts(data_hexalready emitted)..TERM${hex}→[data_hex="${hex}"](word_dom_updates.ts,word_result_init.ts,text_renderer.ts).data_hex(text_reader.ts,text_keyboard.ts,word_actions.ts,text_events.ts).toClassNameassertions (IntegrationTest,TextProcessingTest) + frontend fixtures (tests/frontend/reading/*,tests/frontend/words/*,texts/text_reader.test.ts).Out of scope:
toHex();table_review_row.php'sid="TERM<woId>"(different mechanism).