Skip to content

feat: add DeepCopy, StructuralCopy, two pass parser#16

Open
jensneuse wants to merge 16 commits intomasterfrom
feat/two-pass-parser
Open

feat: add DeepCopy, StructuralCopy, two pass parser#16
jensneuse wants to merge 16 commits intomasterfrom
feat/two-pass-parser

Conversation

@jensneuse
Copy link
Copy Markdown
Member

@jensneuse jensneuse commented Apr 17, 2026

Summary by CodeRabbit

  • New Features

    • Arena-aware structural cloning with transform-driven field rename/filtering; parser-bound deep-copy APIs; value-to-string coercion.
  • Changes

    • Merge APIs simplified (removed returned "changed" flag); parsing and marshaling gain escape-aware fast paths and a two-pass arena parse to reduce allocations; escape-hint tracking added.
  • Documentation

    • README/docs updated to prefer parser-bound deep-copy usage and arena best practices.
  • Tests & Benchmarks

    • Large expansion of tests and benchmarks for parsing, arena flows, copying, transforms, coercion, escape flags, and allocations.
  • Chores

    • .gitignore extended; dependency version updated.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 17, 2026

Walkthrough

Replaces package-level DeepCopy with Parser methods; adds two‑pass arena planner/filler parser; tracks per-value escape hints with a marshal fast-path; implements StructuralCopy and Transform-driven copying; simplifies MergeValues to drop the changed boolean; adds CoerceToString, many tests and benchmarks, and parse/validate refactors.

Changes

Cohort / File(s) Summary
Docs & packaging
README.md, doc.go, update.go, go.mod, .gitignore
Docs/examples switched to (*Parser).DeepCopy; go-arena bumped to v1.2.0; .gitignore extended.
Two‑pass arena parser
parser_arena_twopass.go, parser_arena_twopass_test.go, parser.go, parser_test.go
Add planner+fill two‑pass arena parsing, route arena entrypoints through the two‑pass path, planner accounting, and extensive arena tests.
Escape tables & parse errors
chartable.go, parse_errors.go, validate.go, parser.go
Introduce char classification & hex tables; add sentinel parse errors; replace many fmt.Errorf uses; wire escape-detection helpers.
Value flags & marshal fast-path
values.go, escape_flag_*.go, escape_flag_test.go, escape_flag_stale_test.go
Add stringNeedsEscape, keyNeedsEscape, noEscapeSubtree; implement fast-path MarshalTo, RecomputeEscapeHint, debug/no-op variants and tests.
Deep copy / structural copy / transforms
util.go, util_test.go, transform.go, transform_test.go, copy_api_test.go
Remove package-level DeepCopy; add (*Parser).DeepCopy, DeepCopyWithTransform, StructuralCopy* with planner/fill machinery; introduce Transform types and comprehensive tests.
Merge logic & callsites
mergevalues.go, mergevalues_test.go, arena_gc_test.go
Change MergeValues/MergeValuesWithPath to return (*Value, error) (drop changed); simplify merge logic and update tests/callsites.
Marshal / unescape refactor
parser.go, parser_arena_twopass.go, parser_test.go
Replace unescape/escape helpers with info-returning variants; add appendQuotedString, clean marshal paths, and update callers/tests.
Coercion & value helpers
values.go, coerce_test.go
Add Value.CoerceToString and newStringValue; add thorough coercion tests covering allocation/aliasing/regressions.
Benchmarks & performance tests
parser_timing_test.go, benchmark_improvements_test.go, escape_flag_bench_test.go, hasspecial_threshold_bench_test.go, benchmark_improvements_test.go
Add many benchmarks (DeepCopy, StructuralCopy, transforms, parse helpers, merge scenarios, marshal fast/slow) and shared sinks to avoid dead-code elimination.
Misc tests & small updates
mergevalues_test.go, parser_timing_test.go, parser_test.go, update_test.go, util_test.go, escape_flag_debug_test.go, escape_flag_debug_noop.go, arena_gc_test.go
Update tests to new APIs (parser.DeepCopy, parseRawStringInfo, unescapeStringBestEffortInfo), add debug/no-op build variations, remove obsolete tests, and extend coverage for escape hints, allocations, and transform behavior.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 47.98% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main features added in the changeset: DeepCopy, StructuralCopy, and two-pass parser implementation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
util.go (1)

595-628: Passthrough duplicate-key detection has O(n×m) complexity.

The nested loops checking for already-handled fields and OutputKey collisions run in O(Entries × source_fields) time. This is acceptable for typical JSON objects but could become slow for unusually large objects with many transform entries.

Consider documenting this as a known limitation or using a map-based lookup if performance becomes a concern for large transforms.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@util.go` around lines 595 - 628, The passthrough duplicate-key detection
inside the t.Passthrough block (iterating over v.o.kvs and nested loops over
t.Entries and refs[0:n]) is O(Entries × source_fields); replace the nested scans
with two hash lookups: build a map[string]struct{} of InputKey values from
t.Entries (to test "handled" quickly) and a second map[string]struct{} of
already-emitted keys for refs (to test rename collisions), then use those maps
when iterating v.o.kvs to decide whether to copy a field; update uses of refs,
n, and cp.o.kvs to behave the same way and keep calls to f.allocKV() and
f.structuralCopyValue(entry.v) unchanged.
parser_test.go (2)

111-123: Assert hasEscape in raw parse tests.

Line 111/Line 123 and Line 156/Line 165 drop the new escape-tracking return value. That leaves the new parser contract mostly untested in these paths.

✅ Suggested test tightening
- f := func(s, expectedRS, expectedTail string) {
+ f := func(s, expectedRS, expectedTail string, expectedHasEscape bool) {
     t.Helper()

-    rs, tail, _, err := parseRawStringInfo(s[1:])
+    rs, tail, hasEscape, err := parseRawStringInfo(s[1:])
     if err != nil {
         t.Fatalf("unexpected error on parseRawString: %s", err)
     }
+    if hasEscape != expectedHasEscape {
+        t.Fatalf("unexpected hasEscape on parseRawString; got %v; want %v", hasEscape, expectedHasEscape)
+    }

     // parseRawKey results must be identical to parseRawString.
-    rs, tail, _, err = parseRawKey(s[1:])
+    rs, tail, keyHasEscape, err := parseRawKey(s[1:])
     if err != nil {
         t.Fatalf("unexpected error on parseRawKey: %s", err)
     }
+    if keyHasEscape != expectedHasEscape {
+        t.Fatalf("unexpected hasEscape on parseRawKey; got %v; want %v", keyHasEscape, expectedHasEscape)
+    }
 }

Also applies to: 156-165

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@parser_test.go` around lines 111 - 123, The tests for raw parsing ignore the
new escape-tracking return value, so capture the hasEscape boolean from
parseRawStringInfo and parseRawKey (e.g., rs, tail, hasEscape, err :=
parseRawStringInfo(...) and similarly for parseRawKey) and add assertions that
hasEscape matches the expected value (use the existing expectedRS/expectedTail
context or add expectedHasEscape variables). Update both locations where
parseRawStringInfo and parseRawKey are called so the escape flag is checked,
keeping the existing error/rs/tail assertions unchanged.

100-100: Cover the new needsEscape signal in unescape tests.

These call sites ignore the additional boolean from unescapeStringBestEffortInfo(...). Since this PR introduces escape tracking, tests should assert the flag where practical to prevent regressions.

✅ Suggested helper upgrade
-func testUnescapeStringBestEffort(t *testing.T, s, expectedS string) {
+func testUnescapeStringBestEffort(t *testing.T, s, expectedS string, expectedNeedsEscape bool) {
     t.Helper()
     b := append([]byte{}, s...)
-    us, _ := unescapeStringBestEffortInfo(nil, b2s(b))
+    us, needsEscape := unescapeStringBestEffortInfo(nil, b2s(b))
     if us != expectedS {
         t.Fatalf("unexpected unescaped string; got %q; want %q", us, expectedS)
     }
+    if needsEscape != expectedNeedsEscape {
+        t.Fatalf("unexpected needsEscape; got %v; want %v", needsEscape, expectedNeedsEscape)
+    }
}

Also applies to: 1572-1572, 1579-1579, 1586-1586, 1593-1593, 1601-1601, 1853-1853, 2054-2054, 2061-2061, 2068-2068

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@parser_test.go` at line 100, The tests currently ignore the new boolean
"needsEscape" returned by unescapeStringBestEffortInfo; update each call site
(e.g., change "us, _ := unescapeStringBestEffortInfo(...)" to capture the
boolean like "us, needsEscape := unescapeStringBestEffortInfo(...)" and add an
assertion that needsEscape matches the expected value for that input (true when
escapes were present, false otherwise). Locate usages of
unescapeStringBestEffortInfo in the test file (including the similar spots the
reviewer listed) and for each, capture the second return value and add a concise
assert (e.g., using the test framework's assert/require or t.Fatalf) verifying
needsEscape is correct for the test case.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@parser_arena_twopass.go`:
- Around line 555-559: In the object parsing branch (the code using variables i,
count, s, v where it checks if i == count-1), change the error message produced
when the next rune isn't '}' to accurately reflect the missing closing brace;
replace the fmt.Errorf("missing ',' after object value") call with an error that
says "missing '}' after object value" so the parser returns the correct
diagnostic when the final '}' is absent.
- Around line 492-496: The error text is wrong when the parser reaches the last
expected array element: inside the array parsing branch that checks "if i ==
count-1" (the code that currently tests "if len(s) == 0 || s[0] != ']'" and
returns "missing ',' after array value"), update the fmt.Errorf message to
indicate the missing closing bracket instead (e.g., "missing ']' after array
value" or "missing closing ']' after array value") so the returned error matches
the actual check performed in this block.

In `@util_test.go`:
- Around line 364-390: The test TestDeepCopy_MultipleSequentialCopies builds
JSON using string(rune('0'+i)) which produces incorrect characters for i >= 10;
update the JSON construction to use a proper decimal conversion (e.g.,
strconv.Itoa(i) or fmt.Sprintf("%d", i)) for the "value" field so
ParseBytesWithArena receives valid JSON, and add the necessary import (strconv)
at the top of the test file; keep the rest of the test (entities slice creation,
Parser.ParseBytesWithArena, and p.DeepCopy usage) unchanged.

---

Nitpick comments:
In `@parser_test.go`:
- Around line 111-123: The tests for raw parsing ignore the new escape-tracking
return value, so capture the hasEscape boolean from parseRawStringInfo and
parseRawKey (e.g., rs, tail, hasEscape, err := parseRawStringInfo(...) and
similarly for parseRawKey) and add assertions that hasEscape matches the
expected value (use the existing expectedRS/expectedTail context or add
expectedHasEscape variables). Update both locations where parseRawStringInfo and
parseRawKey are called so the escape flag is checked, keeping the existing
error/rs/tail assertions unchanged.
- Line 100: The tests currently ignore the new boolean "needsEscape" returned by
unescapeStringBestEffortInfo; update each call site (e.g., change "us, _ :=
unescapeStringBestEffortInfo(...)" to capture the boolean like "us, needsEscape
:= unescapeStringBestEffortInfo(...)" and add an assertion that needsEscape
matches the expected value for that input (true when escapes were present, false
otherwise). Locate usages of unescapeStringBestEffortInfo in the test file
(including the similar spots the reviewer listed) and for each, capture the
second return value and add a concise assert (e.g., using the test framework's
assert/require or t.Fatalf) verifying needsEscape is correct for the test case.

In `@util.go`:
- Around line 595-628: The passthrough duplicate-key detection inside the
t.Passthrough block (iterating over v.o.kvs and nested loops over t.Entries and
refs[0:n]) is O(Entries × source_fields); replace the nested scans with two hash
lookups: build a map[string]struct{} of InputKey values from t.Entries (to test
"handled" quickly) and a second map[string]struct{} of already-emitted keys for
refs (to test rename collisions), then use those maps when iterating v.o.kvs to
decide whether to copy a field; update uses of refs, n, and cp.o.kvs to behave
the same way and keep calls to f.allocKV() and f.structuralCopyValue(entry.v)
unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4af03790-f835-4d44-aaa2-0b16b9d2c49b

📥 Commits

Reviewing files that changed from the base of the PR and between 4ae4389 and cb04b8d.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (18)
  • .gitignore
  • README.md
  • arena_gc_test.go
  • coerce_test.go
  • doc.go
  • go.mod
  • parser.go
  • parser_arena_twopass.go
  • parser_arena_twopass_test.go
  • parser_test.go
  • parser_timing_test.go
  • transform.go
  • transform_test.go
  • update.go
  • util.go
  • util_test.go
  • validate.go
  • values.go

Comment thread parser_arena_twopass.go
Comment thread parser_arena_twopass.go
Comment thread util_test.go
jensneuse and others added 6 commits April 17, 2026 16:57
For scalars (numbers, strings, booleans, null), b now replaces a
unconditionally — no comparison, no fastfloat.Parse on the merge hot path.
For object merges, the kv pointer captured during lookup is mutated
directly, skipping the O(|A|) linear scan inside Object.Set.

BREAKING CHANGE: MergeValues and MergeValuesWithPath drop the `changed bool`
return. MergeValues("1.0", "1.00") now returns "1.00" (the b value).

Cumulative benchmark deltas vs baseline (geomean -45%):

  MergeValuesOnly/numbers_equal    12.58n -> 1.55n  (-87.7%)
  MergeValuesOnly/strings_equal     4.53n -> 1.56n  (-65.5%)
  MergeValuesOnly/identical_medium  887n  -> 553n   (-37.6%)
  MergeValuesOnly/identical_twitter 211us -> 140us  (-33.4%)
  MergeValuesWithChanges/50keys    ~9.7us -> 6.6us  (-32%)

Also adds benchmark_improvements_test.go covering merge/parse/unescape/
skipWS/parseRawKey/hasSpecialChars/escapeStringSlowPath/ObjectGet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace fmt.Errorf with package-level errors.New sentinels for shared
static error messages, and errors.New + strconv.Quote for dynamic
interpolated messages. parser.go and validate.go drop the fmt import
entirely; parser_arena_twopass.go keeps fmt only for rare BUG paths in
finish().

Tightens inliner budgets on planObject/parseObject/parseValue; allocation
profile unchanged (success paths didn't allocate errors anyway).

Benchmark deltas (count=5, benchstat):

  ParseArenaAll/small-16                                -7.30%
  ParseArenaAll/medium-16                               -7.28%
  UnescapeStringBestEffortInfo/arena/unicode_escape-16  -8.49%
  UnescapeStringBestEffortInfo/heap/unicode_escape-16   -7.81%
  UnescapeStringBestEffortInfo/*/no_escape_long-16      -5% to -6%

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduce a 256-byte charFlags table with three bitflags (whitespace,
number character, needs-escape). skipWSSlow, hasSpecialChars, and
parseRawNumber now do a single table lookup + AND instead of 2-5 byte
comparisons per iteration.

Benchmark deltas (count=5, vs prior commit):

  HasSpecialChars/plain_long-16                        -35.8%
  HasSpecialChars/has_control-16                       -32.3%
  SkipWS/none-16                                       -30.2%
  UnescapeStringBestEffortInfo/*/no_escape_long-16     -27%
  ParseArenaAll/medium-16                               -4.3%
  ParseArenaAll/small-16                                -3.0%
  geomean across parse suite                            -5.6%

Cumulative from baseline (this branch):

  ParseArenaAll/small-16                               -10.0%
  ParseArenaAll/medium-16                              -11.2%
  ParseArenaAll/large-16                                -7.7%
  ParseArenaAll/twitter-16                              -5.4%

Known regression: SkipWS/long (256 contiguous spaces) +13% — the previous
implementation had a near-perfect branch predictor for always-space input;
the table lookup pays a per-byte memory load. JSON rarely contains such
long whitespace runs, and mild cases (short/mixed/none) all win.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a 256-byte hexDigit table (0xFF for invalid) and a branchless parseHex4
that decodes four ASCII hex digits via four table lookups OR'd together,
validating all four in a single AND against 0xF0. Replaces parseUint16Hex
at both call sites in decodeStringBestEffort.

Benchmark deltas (count=5, vs prior commit):

  UnescapeStringBestEffortInfo/heap/surrogate_pair-16    -31.1%
  UnescapeStringBestEffortInfo/arena/surrogate_pair-16   -30.4%
  UnescapeStringBestEffortInfo/arena/unicode_escape-16   -14.1%
  UnescapeStringBestEffortInfo/heap/unicode_escape-16    -11.9%
  UnescapeStringBestEffortInfo/heap/simple_escape-16      -4.0%

ParseArenaAll unchanged (hexDigit only matters for \uXXXX escapes,
which are rare in the benchmark fixtures).

Note: a companion parseRawKey-uses-strings.IndexByte change was also
tried but rejected — it regressed ParseArenaAll/small by +12.8%
because real JSON keys are typically 2-4 bytes long, below the
threshold where SIMD IndexByte beats a tight byte loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- parser_arena_twopass.go: return precise close-bracket/brace errors
  when the final element is parsed but the closing char is missing,
  instead of the misleading "missing ','" message.
- util_test.go: TestDeepCopy_MultipleSequentialCopies now uses
  strconv.Itoa(i) so the generated JSON stays valid for i >= 10
  (previously produced ':' and similar non-digit runes).
- util.go: document the passthrough duplicate-key linear scan as
  intentional — map lookups would cost two heap allocations per
  call and only pay off at sizes we do not expect here.
- parser_test.go: TestParseRawString and TestUnescapeStringBestEffort
  now assert the new hasEscape / needsEscape return values.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The sinkErr variable was declared but never referenced,
failing the golangci-lint `unused` check on CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
mergevalues.go (1)

57-79: ⚠️ Potential issue | 🟠 Major

Unescape left-hand keys before the inline lookup.

bo is normalized first, but ao.kvs[j].k == k still compares against possibly raw escaped keys in a. If the left object still has keyUnescaped == false, this misses the existing entry and appends a duplicate key instead of merging into it. Please normalize ao first as well, or fall back to the existing Object.Set path for that case.

🔧 Proposed fix
 		for i := range bo.kvs {
 			if !bo.kvs[i].keyUnescaped {
 				bo.unescapeKey(ar, bo.kvs[i])
 			}
 		}
+		for i := range ao.kvs {
+			if !ao.kvs[i].keyUnescaped {
+				ao.unescapeKey(ar, ao.kvs[i])
+			}
+		}
 		for i := range bo.kvs {
 			k := bo.kvs[i].k
 			r := bo.kvs[i].v
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mergevalues.go` around lines 57 - 79, The inline lookup misses matches when
ao contains escaped keys; before using the inline scan that compares ao.kvs[j].k
== k, normalize the left object similarly to bo by unescaping ao's keys: iterate
ao.kvs and call ao.unescapeKey(ar, ao.kvs[i]) for entries with keyUnescaped ==
false (or alternatively, if any ao.kvs[j].keyUnescaped is false, skip the inline
scan and call ao.Set(ar, k, r) to preserve correct behavior); update the code
around the inline lookup using symbols bo.kvs, ao.kvs, keyUnescaped,
unescapeKey, ao.Set, and MergeValues.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@benchmark_improvements_test.go`:
- Around line 22-29: The test declares an unused benchmark sink variable sinkErr
which triggers lint failures; either remove the sinkErr declaration from the
benchFixture group or actually use it in a benchmark (e.g., assign benchmark
error results to sinkErr inside Benchmark* functions). Update the var block
(benchFixtureBunch, sinkString, sinkBytes, sinkBool, sinkErr, sinkValue) to drop
sinkErr if unused, or modify the relevant Benchmark functions to assign their
error results to sinkErr so the variable is referenced.

In `@parser_arena_twopass.go`:
- Around line 422-426: The current branches set v.s = raw when span.hasEscape is
false but do not compute stringNeedsEscape (and likewise keyNeedsEscape for
object keys), so control characters in raw segments go unnoticed and MarshalTo
can emit invalid JSON; update the branches in which you assign v.s = raw (the
span.hasEscape false path and the analogous key path) to also compute the escape
flag from the stored string (either by calling unescapeStringBestEffortInfo on
f.a/raw or by using the same checker used for escaped paths) and assign
v.stringNeedsEscape (and keyNeedsEscape for keys) accordingly; apply the same
change at the other occurrences noted (the blocks around lines 549-550 and
581-594) so all code paths consistently compute escape requirements from the
final stored string.
- Around line 667-694: DecodeRune may return the replacement rune for an invalid
UTF-16 surrogate pair, collapsing two escapes into a single U+FFFD; after
calling utf16.DecodeRune(rune(x), rune(x1)) inside the block handling
surrogates, detect this case (r == utf8.RuneError or r ==
unicode.ReplacementChar) and that the second code unit is not a valid low
surrogate, and in that case preserve both original escape sequences by writing
the literal "\uXXXX\uYYYY" (use write('\\','u'), writeString(xs) for the first
and write('\\','u'), writeString(s[2:6]) for the second), set needsEscape =
true, advance s appropriately, and skip encoding the replacement rune; otherwise
continue encoding r and applying runeNeedsEscaping as before (affecting
utf16.DecodeRune, parseHex4, runeNeedsEscaping, write, writeString, and s).

In `@parser.go`:
- Around line 181-185: The fast-path that sets v.s = ss when hasEscape is false
fails to preserve whether the original raw bytes contained control characters,
causing v.stringNeedsEscape / v.keyNeedsEscape to remain false and produce
invalid JSON when re-marshaled; update the parseRawStringInfo / parseRawKey call
sites (the branches around hasEscape where v.s or v.key is assigned) to detect
control/unescaped bytes even on the no-backslash path and set
v.stringNeedsEscape or v.keyNeedsEscape accordingly (or call a small validator
similar to unescapeStringBestEffortInfo that returns the same "needs escape"
flag) so appendQuotedString will re-escape those raw control bytes; ensure the
same change is applied at the other locations noted (around lines referenced:
the other branches using hasEscape and assignments to v.s/v.key).
- Around line 46-57: The parser currently retains oversized planner buffers
because Parser.arenaScratch is kept across parses and only truncated with [:0],
so add a capacity bound: introduce a MaxArenaScratchCap constant and modify
Parser.ensureArenaScratch to, when p.arenaScratch exists, inspect each planner
buffer field on arenaPlanScratch (the same buffers used in
parser_arena_twopass.go) and if any buffer's capacity exceeds MaxArenaScratchCap
reallocate or reslice it to a new slice with capacity limited to
MaxArenaScratchCap (preserving length/contents as needed); if p.arenaScratch is
nil, allocate it with those bounded-capacity buffers. This ensures
Parser.arenaScratch and the code paths using it (Parser.ensureArenaScratch and
the buffers referenced in parser_arena_twopass.go) cannot retain unbounded peak
capacity.

---

Outside diff comments:
In `@mergevalues.go`:
- Around line 57-79: The inline lookup misses matches when ao contains escaped
keys; before using the inline scan that compares ao.kvs[j].k == k, normalize the
left object similarly to bo by unescaping ao's keys: iterate ao.kvs and call
ao.unescapeKey(ar, ao.kvs[i]) for entries with keyUnescaped == false (or
alternatively, if any ao.kvs[j].keyUnescaped is false, skip the inline scan and
call ao.Set(ar, k, r) to preserve correct behavior); update the code around the
inline lookup using symbols bo.kvs, ao.kvs, keyUnescaped, unescapeKey, ao.Set,
and MergeValues.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5d2abb11-9b48-4863-bda1-6f8f14332251

📥 Commits

Reviewing files that changed from the base of the PR and between cb04b8d and 5d60956.

📒 Files selected for processing (14)
  • .gitignore
  • README.md
  • arena_gc_test.go
  • benchmark_improvements_test.go
  • chartable.go
  • mergevalues.go
  • mergevalues_test.go
  • parse_errors.go
  • parser.go
  • parser_arena_twopass.go
  • parser_test.go
  • util.go
  • util_test.go
  • validate.go
✅ Files skipped from review due to trivial changes (2)
  • .gitignore
  • parse_errors.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • README.md
  • arena_gc_test.go
  • util.go

Comment thread benchmark_improvements_test.go
Comment thread parser_arena_twopass.go
Comment thread parser_arena_twopass.go
Comment thread parser.go Outdated
Comment thread parser.go
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@benchmark_improvements_test.go`:
- Around line 397-399: The test assumes o.kvs is non-empty when computing
middleKey/lastKey/firstKey and will panic if the fixture is empty; add a guard
before those lines that checks len(o.kvs) > 0 and fails the test or skips with a
clear message (e.g., t.Fatalf or t.Skip) if it's zero, so the subsequent use of
middleKey, lastKey, and firstKey is safe.
- Around line 367-373: The benchmark is reusing mutable parsed inputs (av, bv)
across iterations which biases results because MergeValues mutates its inputs;
to fix, ensure each iteration gets fresh inputs by moving MustParse(tc.a) and
MustParse(tc.b) into the loop or by creating immutable copies before calling
MergeValues (e.g., parse into av0/bv0 once and inside the loop clone them into
av/bv), then call MergeValues(nil, av, bv) so every iteration operates on
unmodified inputs; update the BenchmarkMergeValuesOnly loop to use MustParse or
a clone per-iteration to isolate timing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 193a5a87-b701-4ed5-81d0-5e9e119d3fa7

📥 Commits

Reviewing files that changed from the base of the PR and between 5d60956 and 4363801.

📒 Files selected for processing (1)
  • benchmark_improvements_test.go

Comment thread benchmark_improvements_test.go Outdated
Comment thread benchmark_improvements_test.go
Adds a subtree-is-escape-free advisory hint on Value that lets
MarshalTo emit escape-free payloads via a tight fast-path loop,
computed bottom-up at parse, through copies, and after mutations.
Flag fits in existing Value padding (no struct growth). Fast path
shows ~12% gain on clean GraphQL-style payloads vs. the slow path.

Mutation model: Value.Set / SetArrayItem / AppendArrayItems clear
the flag on the directly-mutated node; MergeValues recomputes post
merge. Mutations through a sub-handle can leave ancestors with a
stale-true flag; callers recover with Value.RecomputeEscapeHint().
An astjson_debug build-tag verifier panics on stale-true flags in
the marshal fast path to catch this in tests.

Fixes latent bug in Parser.StructuralCopyWithTransform where the
emitted keyNeedsEscape was hardcoded false — a user-supplied
OutputKey containing " or \ would round-trip as invalid JSON.

Removes dead !keyUnescaped defensive loops in mergevalues.go and
update.go: every in-tree kv producer sets keyUnescaped=true, so
hand-constructed kvs with keyUnescaped=false are not supported.

Also rolls in PR#16 review follow-ups:
- Parser fast-path now sets stringNeedsEscape/keyNeedsEscape via
  hasSpecialChars so raw control bytes round-trip as valid JSON
  (parser.go, parser_arena_twopass.go).
- decodeStringBestEffort detects utf16.DecodeRune returning
  unicode.ReplacementChar and preserves both original \uXXXX
  escapes instead of collapsing invalid surrogate pairs to U+FFFD.
- Parser.arenaScratch is capacity-bounded (maxArenaScratchCap=4096)
  so a single outlier parse doesn't permanently bloat a pooled
  Parser.
- BenchmarkMergeValuesOnly re-parses fresh inputs per iteration;
  MergeValues mutates its left operand and was biasing results.
- BenchmarkObjectGetBunchFields guards against empty fixture.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
util.go (1)

278-289: Minor over-allocation of string bytes for missing entries.

String bytes for OutputKey are counted (line 279) before checking whether child exists (line 280-282). If child == nil, the fill phase skips this entry but the arena space for the key string is already allocated.

This is consistent with the documented over-counting philosophy for kvs (line 268 comment), so it's acceptable. The wasted arena space is bounded by the number of missing transform entries.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@util.go` around lines 278 - 289, The code preemptively increments
plan.stringBytes by len(t.Entries[i].OutputKey) before verifying the source
value exists, causing minor over-allocation when child == nil; fix by moving the
plan.stringBytes += len(...) statement to after the child nil-check (i.e.,
perform it only when child != nil) inside the loop that iterates t.Entries so
that only entries actually processed by countStructuralCopyWithTransformValue or
countStructuralCopyValue contribute to string byte accounting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@escape_flag_bench_test.go`:
- Around line 34-43: The "slow" benchmark only clears the root hint
(v.noEscapeSubtree) so nested nodes with noEscapeSubtree=true still take fast
paths; update the benchmark to disable the hint for the whole subtree before
running the loop—either traverse v and set noEscapeSubtree=false on every child
node (recursive clear), or create a deep clone of v and force
noEscapeSubtree=false for every node in the clone, then call
Clone.MarshalTo(dst[:0]) inside the loop and restore/keep the original v and its
RecomputeEscapeHint() for other benchmarks; reference v.noEscapeSubtree,
v.MarshalTo, and v.RecomputeEscapeHint() when making the change.

In `@parser.go`:
- Around line 635-646: Object.marshalToClean is recursing into children without
re-checking their escape state, so stale ancestor noEscapeSubtree=true flags can
cause unescaped/invalid JSON to be emitted after nested mutations; update
Object.marshalToClean (and the analogous array clean path) to not directly call
child.marshalToClean for each kv/element but instead call the child's MarshalTo
(or first re-evaluate the child's fast-path eligibility) so that the child's own
escape checks are respected after mutations (refer to Object.marshalToClean,
Value.marshalToClean/marshalTo, and mutation sites Value.Set / SetArrayItem to
locate the related code).

---

Nitpick comments:
In `@util.go`:
- Around line 278-289: The code preemptively increments plan.stringBytes by
len(t.Entries[i].OutputKey) before verifying the source value exists, causing
minor over-allocation when child == nil; fix by moving the plan.stringBytes +=
len(...) statement to after the child nil-check (i.e., perform it only when
child != nil) inside the loop that iterates t.Entries so that only entries
actually processed by countStructuralCopyWithTransformValue or
countStructuralCopyValue contribute to string byte accounting.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4d5629c0-0335-4d6d-98bf-2a4ad7898075

📥 Commits

Reviewing files that changed from the base of the PR and between 4363801 and 2631781.

📒 Files selected for processing (16)
  • benchmark_improvements_test.go
  • escape_flag_bench_test.go
  • escape_flag_debug.go
  • escape_flag_debug_noop.go
  • escape_flag_debug_test.go
  • escape_flag_test.go
  • mergevalues.go
  • mergevalues_test.go
  • parser.go
  • parser_arena_twopass.go
  • parser_arena_twopass_test.go
  • parser_test.go
  • update.go
  • update_test.go
  • util.go
  • values.go
💤 Files with no reviewable changes (1)
  • update_test.go
✅ Files skipped from review due to trivial changes (2)
  • escape_flag_debug_noop.go
  • escape_flag_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • values.go
  • parser_arena_twopass.go
  • benchmark_improvements_test.go

Comment thread escape_flag_bench_test.go
Comment thread parser.go
- Remove unused (*Object).unescapeKey method; Phase 3 of the prior
  commit dropped its only callers, so CI lint flagged it.

- Fix stale-ancestor correctness bug in MarshalTo fast path: when a
  caller mutates through a sub-handle, ancestors can stay
  noEscapeSubtree=true while descendants are dirty. The old
  marshalToClean recursed via child.marshalToClean unconditionally,
  bypassing the child's own hint and producing invalid JSON. Now it
  recurses via child.MarshalTo so each node re-checks its own flag.
  Add regression test TestMarshalFastPathStaleAncestorSafety (gated
  behind !astjson_debug since the debug verifier panics on exactly
  this scenario by design).

- BenchmarkMarshalCleanFastPath/slow now recursively clears
  noEscapeSubtree on every node, so the slow branch no longer benefits
  from child-level fast paths. Honest fast vs. slow: ~66 vs. ~73 ns/op
  (~10% win on small.json).

- Collapse two sequential nil+length checks in BenchmarkObjectGetBunchFields
  into one expression; avoids a staticcheck false-positive on the
  b.Fatal return path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
escape_flag_stale_test.go (1)

19-41: Add the analogous stale-array regression case too.

This protects the object clean path well, but the same stale-ancestor recursion exists for arrays in parser.go (Value.marshalToClean, Lines 776-784). A sibling case like [{"inner":1}] mutated through a sub-handle would keep that path covered too.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@escape_flag_stale_test.go` around lines 19 - 41, Add a new test mirroring
TestMarshalFastPathStaleAncestorSafety but for arrays to cover the
stale-ancestor recursion in Value.marshalToClean: parse an array like
`[{"inner":1}]` with Parser.ParseWithArena, obtain the sub-handle for the first
element (the object inside the array) using the same handle API you use for
arrays in tests, mutate that sub-handle (call Set to add an escaped key) and
assert the sub-handle’s noEscapeSubtree is invalidated while the root value’s
noEscapeSubtree remains stale-true, then call MarshalTo and compare the emitted
JSON to the expected string; name it e.g.
TestMarshalFastPathStaleAncestorSafetyArray and follow the same assertion
pattern and checks (inner.noEscapeSubtree, v.noEscapeSubtree, v.MarshalTo) as
the existing object test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@escape_flag_stale_test.go`:
- Around line 19-41: Add a new test mirroring
TestMarshalFastPathStaleAncestorSafety but for arrays to cover the
stale-ancestor recursion in Value.marshalToClean: parse an array like
`[{"inner":1}]` with Parser.ParseWithArena, obtain the sub-handle for the first
element (the object inside the array) using the same handle API you use for
arrays in tests, mutate that sub-handle (call Set to add an escaped key) and
assert the sub-handle’s noEscapeSubtree is invalidated while the root value’s
noEscapeSubtree remains stale-true, then call MarshalTo and compare the emitted
JSON to the expected string; name it e.g.
TestMarshalFastPathStaleAncestorSafetyArray and follow the same assertion
pattern and checks (inner.noEscapeSubtree, v.noEscapeSubtree, v.MarshalTo) as
the existing object test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2ed69aa1-ffc2-45ab-af1d-9662d394f965

📥 Commits

Reviewing files that changed from the base of the PR and between 2631781 and b5f96ea.

📒 Files selected for processing (4)
  • benchmark_improvements_test.go
  • escape_flag_bench_test.go
  • escape_flag_stale_test.go
  • parser.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • escape_flag_bench_test.go
  • benchmark_improvements_test.go

@jensneuse
Copy link
Copy Markdown
Member Author

@coderabbitai resume

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 18, 2026

✅ Actions performed

Reviews resumed.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
benchmark_improvements_test.go (1)

335-347: Minor: intToStr only handles non-negative integers.

The helper will infinite-loop on negative inputs. This is safe in the current usage context (indices i and i+1000 are always non-negative), but consider adding a brief comment or a panic guard if this helper might be reused.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmark_improvements_test.go` around lines 335 - 347, The intToStr helper
(function intToStr) currently only handles non-negative integers and will loop
forever for negative inputs; add an explicit guard or handling for negatives —
e.g., if n < 0 either panic with a clear message or support the sign by
converting -n and prefixing '-' — and/or add a short comment stating the
function expects non-negative values so future callers don't misuse it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@benchmark_improvements_test.go`:
- Around line 335-347: The intToStr helper (function intToStr) currently only
handles non-negative integers and will loop forever for negative inputs; add an
explicit guard or handling for negatives — e.g., if n < 0 either panic with a
clear message or support the sign by converting -n and prefixing '-' — and/or
add a short comment stating the function expects non-negative values so future
callers don't misuse it.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b4e88609-19cb-4f21-bc45-ad2bb527a14d

📥 Commits

Reviewing files that changed from the base of the PR and between 2631781 and b5f96ea.

📒 Files selected for processing (4)
  • benchmark_improvements_test.go
  • escape_flag_bench_test.go
  • escape_flag_stale_test.go
  • parser.go
✅ Files skipped from review due to trivial changes (1)
  • escape_flag_stale_test.go

jensneuse and others added 2 commits April 18, 2026 14:42
Replace the per-byte charFlags table loop with an 8-byte SWAR scan that
detects '"', '\\' and any control byte (< 0x20) in parallel using the
classic Mycroft haszero primitive. The tail (< 8 leftover bytes) keeps
the per-byte path. Crossover with the byte loop is at len ~= 8.

Measured on M4 Max with the new BenchmarkHasSpecialSweep:

  len    Loop ns    SWAR ns    speedup
  8       3.15       2.31       1.36x
  16      5.98       3.51       1.70x
  64      18.5      11.9        1.55x
  128    42.7      20.9         2.04x
  1024  277        167          1.66x
  4096 1063        658          1.62x

Smaller-than-8 strings pay a ~0.2 ns penalty (one branch + same tail
loop), well within noise on hot callers.

Also adds hasspecial_threshold_bench_test.go which exercises four
variants (Loop, IndexAny, SWAR, Hybrid) across a length sweep so
future changes can be checked against the same baseline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add TestMarshalFastPathStaleAncestorSafetyArray mirroring the
  existing object test, to cover Value.marshalToClean's array
  recursion (CodeRabbit nitpick on b5f96ea).
- Document intToStr's non-negative-input precondition — it loops
  forever on negatives; safe in current usage but worth flagging.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
hasspecial_threshold_bench_test.go (1)

59-63: Use strings.ContainsAny for clearer intent.

The static analysis tool correctly flags that strings.IndexAny(s, escapeSetString) >= 0 should be replaced with strings.ContainsAny(s, escapeSetString) for readability and idiomatic Go.

♻️ Proposed fix
 // hasSpecialCharsIndexAny relies on strings.IndexAny. For ASCII-only
 // charsets this uses an asciiSet bitmap with a tight inner loop.
 func hasSpecialCharsIndexAny(s string) bool {
-	return strings.IndexAny(s, escapeSetString) >= 0
+	return strings.ContainsAny(s, escapeSetString)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hasspecial_threshold_bench_test.go` around lines 59 - 63, Replace the use of
strings.IndexAny in hasSpecialCharsIndexAny with strings.ContainsAny to express
intent more clearly: change the predicate strings.IndexAny(s, escapeSetString)
>= 0 to strings.ContainsAny(s, escapeSetString) while keeping the function
signature hasSpecialCharsIndexAny(s string) bool and using the same
escapeSetString constant.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@hasspecial_threshold_bench_test.go`:
- Around line 59-63: Replace the use of strings.IndexAny in
hasSpecialCharsIndexAny with strings.ContainsAny to express intent more clearly:
change the predicate strings.IndexAny(s, escapeSetString) >= 0 to
strings.ContainsAny(s, escapeSetString) while keeping the function signature
hasSpecialCharsIndexAny(s string) bool and using the same escapeSetString
constant.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ab62fe74-0632-45f8-8e63-8b1f779a26ae

📥 Commits

Reviewing files that changed from the base of the PR and between b5f96ea and 92d5fa3.

📒 Files selected for processing (4)
  • benchmark_improvements_test.go
  • escape_flag_stale_test.go
  • hasspecial_threshold_bench_test.go
  • parser.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • escape_flag_stale_test.go

CI golangci-lint v2.10.1 staticcheck S1003 flags the
strings.IndexAny(...) >= 0 idiom. Pre-existing issue that slipped
through earlier lint runs (cached result); now blocking PR#16.

Semantically identical; strings.ContainsAny returns exactly that
predicate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
hasspecial_threshold_bench_test.go (2)

165-173: Assert the real hasSpecialChars here too.

This test currently proves that the local benchmark helpers agree with each other, but it would still pass if parser.go's production hasSpecialChars drifted from these copies. Adding the production function here keeps the benchmark harness anchored to the real implementation.

♻️ Proposed refactor
 	for _, in := range inputs {
 		want := hasSpecialCharsLoop(in)
+		if got := hasSpecialChars(in); got != want {
+			t.Errorf("prod(%q) = %v, want %v", in, got, want)
+		}
 		if got := hasSpecialCharsIndexAny(in); got != want {
 			t.Errorf("IndexAny(%q) = %v, want %v", in, got, want)
 		}
 		if got := hasSpecialCharsSWAR(in); got != want {
 			t.Errorf("SWAR(%q) = %v, want %v", in, got, want)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hasspecial_threshold_bench_test.go` around lines 165 - 173, Add an assertion
that calls the real production hasSpecialChars (from parser.go) and compare its
result to want so the benchmark helpers are validated against the actual
implementation; specifically, in the test alongside hasSpecialCharsLoop,
hasSpecialCharsIndexAny, hasSpecialCharsSWAR, and hasSpecialCharsHybrid add a
check like `if got := hasSpecialChars(in); got != want { t.Errorf(...) }` to
ensure the production function matches the expected behavior returned by
hasSpecialCharsLoop.

34-40: Derive escapeSetString from charFlags instead of hard-coding it.

chartable.go already owns the escape definition. Keeping a second manual list here means the ContainsAny benchmark can drift from production semantics if the escape table changes later.

♻️ Proposed refactor
 var escapeSetString = func() string {
-	b := make([]byte, 0, 32)
-	for c := 0; c < 0x20; c++ {
-		b = append(b, byte(c))
-	}
-	b = append(b, '"', '\\')
+	b := make([]byte, 0, 34)
+	for c := 0; c <= 0xFF; c++ {
+		if charFlags[byte(c)]&charEscape != 0 {
+			b = append(b, byte(c))
+		}
+	}
 	return string(b)
 }()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hasspecial_threshold_bench_test.go` around lines 34 - 40, Replace the
hard-coded escapeSetString construction with a builder that derives its bytes
from the shared charFlags table in chartable.go so the benchmark stays in sync
with production. Specifically, in the escapeSetString initializer use the
chartable.charFlags array and the escape bit (e.g., cfEscape or the constant
used in chartable.go) to append every byte c where charFlags[c] has the escape
flag set (and ensure '"' and '\\' are included if chartable doesn't mark them),
keeping the symbol name escapeSetString and preserving its use in the
ContainsAny benchmark.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@hasspecial_threshold_bench_test.go`:
- Around line 165-173: Add an assertion that calls the real production
hasSpecialChars (from parser.go) and compare its result to want so the benchmark
helpers are validated against the actual implementation; specifically, in the
test alongside hasSpecialCharsLoop, hasSpecialCharsIndexAny,
hasSpecialCharsSWAR, and hasSpecialCharsHybrid add a check like `if got :=
hasSpecialChars(in); got != want { t.Errorf(...) }` to ensure the production
function matches the expected behavior returned by hasSpecialCharsLoop.
- Around line 34-40: Replace the hard-coded escapeSetString construction with a
builder that derives its bytes from the shared charFlags table in chartable.go
so the benchmark stays in sync with production. Specifically, in the
escapeSetString initializer use the chartable.charFlags array and the escape bit
(e.g., cfEscape or the constant used in chartable.go) to append every byte c
where charFlags[c] has the escape flag set (and ensure '"' and '\\' are included
if chartable doesn't mark them), keeping the symbol name escapeSetString and
preserving its use in the ContainsAny benchmark.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 925afbb8-7f28-46a9-b4db-cf9b21c2121f

📥 Commits

Reviewing files that changed from the base of the PR and between 92d5fa3 and 3451331.

📒 Files selected for processing (1)
  • hasspecial_threshold_bench_test.go

jensneuse and others added 3 commits April 19, 2026 12:13
Adds four package-level copy functions mirroring the existing Parser
methods, each with a pooled planner scratch (sync.Pool) so ad-hoc calls
don't pay per-invocation allocation cost:

  DeepCopy(a, v)
  DeepCopyWithTransform(a, v, t)   // new
  StructuralCopy(a, v)
  StructuralCopyWithTransform(a, v, t)

Fills the matrix gap: DeepCopyWithTransform is new (both as package
func and Parser.DeepCopyWithTransform method).

Semantic change (breaking): DeepCopy / StructuralCopy / StructuralCopyWithTransform
no longer short-circuit to `return v` when the arena is nil. They now
always produce an independent copy:

  - DeepCopy in heap mode: scalar string payloads cloned via
    strings.Clone; containers freshly heap-allocated.
  - StructuralCopy in heap mode: containers freshly allocated;
    scalar leaves aliased from source (matches arena semantics).
  - Singletons (valueTrue / valueFalse / valueNull) remain shared
    (immutable).

Heap-mode implementation via four new primitives — copyValueHeap,
structuralCopyValueHeap, and their *WithTransform variants — instead
of threading a nil-arena code path through the slab-based fill state.
Keeps both paths simple and makes heap-mode behavior trivially verifiable.

Arena-mode DeepCopyWithTransform gets its own planner
(planDeepCopyWithTransformScratch + countDeepCopyWithTransformValue)
and fill method (deepCopyWithTransformValue), mirroring the existing
structural-copy-with-transform pipeline but cloning scalar payloads
instead of aliasing.

Tests in copy_api_test.go cover: heap independence, arena independence,
singletons shared, nil input, transform matrix (rename, passthrough,
nested Child, ArrayItem, both arena and heap), transform independence,
structural-heap aliasing invariant, and parity between package funcs
and Parser methods.

Updates TestArenaGCSafety_DeepCopy_NilArena to assert independent copy
rather than pointer identity. README "no-op when a is nil" note removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex review [P1] — the DeepCopyWithTransform (and the pre-existing
StructuralCopyWithTransform) planner over-counted source fields in the
Passthrough pass. When an emitted OutputKey matched an existing source
key, the fill phase drops that source field (rename wins), but the
planner still appended its objectSizes/arraySizes. Subsequent sibling
subtrees then read misaligned slab positions and panicked.

Extract the shared "would this passthrough field be dropped?" predicate
as passthroughSkipped and call it from both planners. The fill already
implements the same rule; the planner and fill must agree on the skip
set for slab positions to line up.

Adds TestDeepCopyWithTransformPassthroughCollision (simple + nested-
with-sibling, heap + arena) and TestStructuralCopyWithTransformPassthroughCollision
as regression guards. The nested-with-sibling case is the exact
reproduction codex flagged: a child transform renames a->b while the
source also has a literal b, plus a sibling y under the outer object
that would otherwise read the drifted plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex review [P2] — when a source object contains two kvs with the same
key and Passthrough is on, the fill emits only the first occurrence and
refs-skips the rest. The planner counted every duplicate, leaving extra
objectSizes / arraySizes entries that misaligned subsequent sibling
reads and panicked during marshal.

Both DeepCopyWithTransform and StructuralCopyWithTransform planners
now match the fill by skipping any passthrough key that already appeared
earlier in the source kvs. Linear scan per field, consistent with the
existing O(n*m) Entries matching.

Regression test covers heap + arena (deep) and arena (structural).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
copy_api_test.go (2)

361-381: Extend parity coverage to the transform wrappers too.

TestPackageLevelParity currently skips DeepCopyWithTransform and StructuralCopyWithTransform, which are part of the new package-level API surface in this PR. Adding one transformed heap case and one transformed arena case here would make wrapper regressions much harder to miss.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@copy_api_test.go` around lines 361 - 381, TestPackageLevelParity currently
omits the transform-wrapper variants; add assertions that
Parser.DeepCopyWithTransform(nil, src, transform) equals
DeepCopyWithTransform(nil, src, transform) and
Parser.DeepCopyWithTransform(arena, src, transform) equals
DeepCopyWithTransform(arena, src, transform) (use a simple identity or noop
transform), and do the same for Parser.StructuralCopyWithTransform(nil/ar, src,
transform) vs StructuralCopyWithTransform(nil/ar, src, transform); place these
as a transformed heap case and a transformed arena case within
TestPackageLevelParity to ensure parity for DeepCopyWithTransform and
StructuralCopyWithTransform.

147-154: Add an independence assertion for the arena transform path.

This subtest only checks marshaled output. The new arena-specific DeepCopyWithTransform planner/fill path could still alias source subtrees and pass here, so I'd add a post-copy mutation check similar to TestDeepCopyArenaIndependence / TestDeepCopyWithTransformIndependence.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@copy_api_test.go` around lines 147 - 154, The arena-subtest only asserts
marshaled output and can miss aliasing; after creating cp :=
DeepCopyWithTransform(a, src, tc.xform) add an independence assertion: mutate
the original src (parsed via Parser.ParseWithArena/variable src) in a way used
by existing tests (see TestDeepCopyArenaIndependence /
TestDeepCopyWithTransformIndependence for examples), then re-marshal cp
(cp.MarshalTo) and assert it is unchanged compared to the pre-mutation marshal
result to ensure DeepCopyWithTransform produced an independent copy from the
arena path.
util.go (1)

385-399: Reuse the shared passthrough predicate in the fill paths as well.

passthroughSkipped now defines the planner-side collision rules, but the fill paths still re-implement the same InputKey/OutputKey skip logic inline four times. Since the recent panic was caused by planner/fill drift, I'd factor the fill-side check through a shared helper too, with duplicate-key dedupe layered on top.

Also applies to: 789-819, 908-935, 1201-1229, 1308-1336

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@util.go` around lines 385 - 399, The fill-side code duplicates the
InputKey/OutputKey skip logic that passthroughSkipped(src *Value, key string, t
*Transform) implements; refactor by extracting/reusing that shared predicate for
the fill paths (call passthroughSkipped or a thin wrapper) and replace the four
inline checks with calls to it, then add the duplicate-key de-duplication layer
on top in the fill logic (i.e., after you call the shared predicate, ensure any
repeated OutputKey entries are filtered once) so planner and fill use the same
collision rules; search for the inline checks that compare InputKey/OutputKey
and src.Get(...) and replace them with the shared helper plus dedupe.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@copy_api_test.go`:
- Around line 361-381: TestPackageLevelParity currently omits the
transform-wrapper variants; add assertions that
Parser.DeepCopyWithTransform(nil, src, transform) equals
DeepCopyWithTransform(nil, src, transform) and
Parser.DeepCopyWithTransform(arena, src, transform) equals
DeepCopyWithTransform(arena, src, transform) (use a simple identity or noop
transform), and do the same for Parser.StructuralCopyWithTransform(nil/ar, src,
transform) vs StructuralCopyWithTransform(nil/ar, src, transform); place these
as a transformed heap case and a transformed arena case within
TestPackageLevelParity to ensure parity for DeepCopyWithTransform and
StructuralCopyWithTransform.
- Around line 147-154: The arena-subtest only asserts marshaled output and can
miss aliasing; after creating cp := DeepCopyWithTransform(a, src, tc.xform) add
an independence assertion: mutate the original src (parsed via
Parser.ParseWithArena/variable src) in a way used by existing tests (see
TestDeepCopyArenaIndependence / TestDeepCopyWithTransformIndependence for
examples), then re-marshal cp (cp.MarshalTo) and assert it is unchanged compared
to the pre-mutation marshal result to ensure DeepCopyWithTransform produced an
independent copy from the arena path.

In `@util.go`:
- Around line 385-399: The fill-side code duplicates the InputKey/OutputKey skip
logic that passthroughSkipped(src *Value, key string, t *Transform) implements;
refactor by extracting/reusing that shared predicate for the fill paths (call
passthroughSkipped or a thin wrapper) and replace the four inline checks with
calls to it, then add the duplicate-key de-duplication layer on top in the fill
logic (i.e., after you call the shared predicate, ensure any repeated OutputKey
entries are filtered once) so planner and fill use the same collision rules;
search for the inline checks that compare InputKey/OutputKey and src.Get(...)
and replace them with the shared helper plus dedupe.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d2d94f07-b62d-4e44-b715-2f57f071ba72

📥 Commits

Reviewing files that changed from the base of the PR and between 3451331 and f600d16.

📒 Files selected for processing (4)
  • README.md
  • arena_gc_test.go
  • copy_api_test.go
  • util.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • README.md
  • arena_gc_test.go

Copy link
Copy Markdown
Member

@devsergiy devsergiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes are hard to understand without additional guidance

I recommend adding more documentation to explain:

  • How the different modes work.
  • What is the rationale for each change
  • separate document with tradeoffs and assumptions

Even more inline comments will probably also be good.

Comment thread transform_test.go
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this file belongs to parser, it could be named parser_transform_test.go

Comment thread util.go
//
// Returns nil when v is nil. The immutable singletons (valueTrue, valueFalse,
// valueNull) are shared rather than cloned.
func (p *Parser) DeepCopy(a arena.Arena, v *Value) *Value {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why parser methods in util file?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be moved to parser_deep_copy.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants