Skip to content

fix(frontend): preserve \u00XX escapes in Avro JSON viewer#2425

Open
twmb wants to merge 1 commit intomasterfrom
tb/avro-bytes-display-escape
Open

fix(frontend): preserve \u00XX escapes in Avro JSON viewer#2425
twmb wants to merge 1 commit intomasterfrom
tb/avro-bytes-display-escape

Conversation

@twmb
Copy link
Copy Markdown
Contributor

@twmb twmb commented Apr 24, 2026

Summary

  • Fixes the copy-paste loss reported in [Bug] Avro bytes fields rendered as Latin-1 codepoint strings instead of Base64 since v3.7.0 #2421 for Avro bytes / fixed fields.
  • Root cause: Avro JSON encodes bytes as ISO-8859-1 code points (\u00XX), but the frontend runs the payload through JSON.parse then JSON.stringify for display. JSON.stringify does not re-escape non-ASCII, so a byte like 0xDB renders as the literal glyph Û and "Copy Value" places UTF-8 bytes on the clipboard instead of the original byte.
  • Fix is frontend-only and scoped: KowlJsonView gains an optional escapeLatin1 prop that re-escapes code points in 0x80-0xFF back to \u00XX before display. PayloadComponent sets it only when payload.encoding === 'avro', so non-avro payloads are unchanged.
  • The transform is lossless for both Avro bytes fields (recovers the exact byte) and legitimate Latin-1 strings (valid JSON escape for the same Unicode character).

Test plan

  • Open an Avro topic with a bytes field (e.g. Debezium VariableScaleDecimal.value); confirm the value renders as \u00XX escapes and "Copy Value" yields the on-wire JSON.
  • Confirm non-avro payloads (proto, json, text) render unchanged.
  • bun run type:check && bun run lint && bun run test pass in CI.

🤖 Generated with Claude Code

Avro's JSON encoding represents bytes/fixed as ISO-8859-1 code points
(\u00XX for anything above 0x7F). Running the normalized payload through
JSON.parse then JSON.stringify for display launders those escapes into
literal Unicode glyphs (e.g. Û -> U+00DB -> "Û"), so "Copy Value"
placed UTF-8 bytes on the clipboard instead of the original byte.

Scoped to payload.encoding === 'avro' only. KowlJsonView now re-escapes
code points in 0x80-0xFF back to \u00XX before display when called from
an avro payload view, which is lossless for both Avro bytes fields
(recovers the exact byte) and legitimate Latin-1 strings (valid JSON
escape for the same Unicode character). Non-avro encodings (proto, json,
text, etc.) are unchanged.

Fixes #2421.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@weeco
Copy link
Copy Markdown
Contributor

weeco commented Apr 26, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 26, 2026

Claude finished @weeco's task in 1m 55s —— View job


Code Review

  • Gather context and read changed files
  • Review kowl-json-view.tsx changes
  • Review payload-component.tsx changes
  • Post final review

The fix is correct and well-scoped. Root cause analysis is accurate and the implementation addresses it cleanly. A few observations:

frontend/src/components/misc/kowl-json-view.tsx

Correctness: The useMemo replacement is sound.

return raw.replace(/[-ÿ]/g, (c) => `\\u00${c.charCodeAt(0).toString(16).padStart(2, '0')}`);
  • Regex range -ÿ is exactly the Avro bytes encoding range — correct.
  • charCodeAt(0) for this range always gives a 2-digit hex value (80-ff), so padStart(2, '0') is technically a no-op but harmless.
  • Dependencies [props.srcObj, props.escapeLatin1] are correct.

Comment verbosity: The 4-line comment on the escapeLatin1 prop explains the "why" well, but it's longer than typical for a prop JSDoc. Minor style nit only.

frontend/src/components/pages/topics/Tab.Messages/message-display/payload-component.tsx

return <KowlJsonView escapeLatin1={payload.encoding === 'avro'} srcObj={renderData.content} />;

Observation — all Avro fields affected, not just bytes: escapeLatin1 is enabled for the entire Avro payload, so Avro string fields containing legitimate Latin-1 text (e.g. "café") will also be re-escaped to é etc. in the viewer. This is technically valid JSON, but it may surprise users reading human-readable string fields. Worth being aware of, even if it's an acceptable tradeoff for now (and fixing it would require schema-level field type awareness).

Missing test

The test file kowl-json-view.test.tsx already has the mock infrastructure to capture props.value passed to the Monaco editor via editorPropsSpy. It would be straightforward to add a test:

test('escapeLatin1 re-escapes Latin-1 code points in the editor value', async () => {
  render(<KowlJsonView escapeLatin1 srcObj={{ v: 'Û' }} />);
  await waitFor(() => {
    const value = editorPropsSpy.mock.lastCall?.[0].value as string;
    expect(value).toContain('\\u00db');
    expect(value).not.toContain('Û');
  });
});

Not a blocker, but given this is a regression fix it would be good to lock the behavior in. Fix this →


Overall: LGTM. The fix is correct, appropriately scoped, and doesn't touch non-Avro code paths. The Avro string field display change is a known tradeoff. The missing test is the main thing worth adding before merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants