Skip to content

fix: buffer ReplacementStream chunks before applying replacements#1756

Open
ZLeventer wants to merge 1 commit intoforcedotcom:mainfrom
ZLeventer:fix/replacement-stream-chunk-boundary-3461
Open

fix: buffer ReplacementStream chunks before applying replacements#1756
ZLeventer wants to merge 1 commit intoforcedotcom:mainfrom
ZLeventer:fix/replacement-stream-chunk-boundary-3461

Conversation

@ZLeventer
Copy link
Copy Markdown

What

ReplacementStream._transform ran the replacement regex against each chunk independently. Any token that was split across a chunk boundary (e.g. #SOME + _REPLACEMENT#) failed to match and survived into the converted output unmodified — and because the warning bookkeeping was per-chunk, the singleFile "not found" warning could also misfire.

This change buffers all chunks in _transform and runs replacementIterations once in _flush. Metadata files are bounded in size, so the memory cost is negligible compared to the correctness win. The per-chunk warning bookkeeping collapses to a single pass.

Why

Refs forcedotcom/cli#3461 — users report tokens silently surviving conversion. Reproducible whenever Node's stream layer splits a file mid-token (large XML files, slow disks, mixed encodings).

Test plan

  • Added a regression test in test/convert/replacements.test.ts that splits #SOME_REPLACEMENT# at its midpoint across two chunks via Readable.from([chunk1, chunk2]) and asserts both that the token is replaced and that no "not found" warning is emitted.
  • All existing ReplacementStream tests still pass, including the warning-emission tests for singleFile replacements.
  • yarn compile, yarn lint, yarn test:only clean locally on Node 22.

ReplacementStream ran the replacement regex against each chunk
independently, so any token that was split across a chunk boundary
(e.g. `#SOME` + `_REPLACEMENT#`) would fail to match and survive
into the converted output unmodified.

Buffer all chunks and run the replacement once in `_flush`. Metadata
files are bounded in size so the memory cost is negligible compared
to the correctness win, and the warning bookkeeping for `singleFile`
replacements collapses naturally to a single pass.

Adds a regression test that splits a token at its midpoint across two
chunks via `Readable.from` and asserts the token is replaced and no
"not found" warning is emitted.

Refs forcedotcom/cli#3461
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant