Add MergeDataSegments pass by LegionMammal978 · Pull Request #8647 · WebAssembly/binaryen

LegionMammal978 · 2026-04-27T16:30:24Z

Recently, I was writing a WASM module by hand, and I used a number of individual small data segments to store strings. I tried seeing if wasm-opt could combine these adjacent data segments, but found that it did not have any such pass. Thus, I've implemented a new MergeDataSegments pass to merge active data segments that are overlapping, adjacent, or near-adjacent, in order to save the space of storing multiple data-segment headers, and the time of processing them during instantiation.

It is designed to be as aggressive as possible, fully supporting multiple memories and accounting for non-constant-offset segments. Meanwhile, unless TNH is enabled, it is also designed to carefully replicate the original module's behavior w.r.t. out-of-bounds traps during instantiation: the goal is that there should be no observable difference in the output, short of unreliable tricks like reading a partially-instantiated SharedArrayBuffer.

In principle, this functionality might have been included in the existing MemoryPacking pass, but I believe that it makes sense to separate the primary functionalities of splitting vs. merging data segments, which require different forms of tracking. In that sense, I see MergeDataSegments as complementing the MemoryPacking pass. For instance, MemoryPacking requires that its input has no overlapping data segments, a property that MergeDataSegments is often able to ensure in its output.

Some implementation notes:

Following the behavior of MemoryPacking, it reads TNH as asserting traps never happen during instantiation, so in particular data segments are never out of bounds. Also, it only considers TNH and not --ignore-implicit-traps, following MemoryPacking.
The pass detects certain cases when a data segment is necessarily out-of-bounds: in that case, it simply emits the offending data segment last, and drops all remaining data segments. In principle, since the module cannot be fully instantiated, it could be even more aggressive with replacing every function body with unreachable, etc., but this is a small edge case.
Following MemoryPacking, this pass assumes that unless a memory is imported, it is zero-initialized, and its initial and maximum sizes are exactly as declared. It also assumes that during data initialization, the memory cannot be modified by anything other than the declared data segments. It is my understanding that these assumptions are not affected by open-world vs. closed-world.
The pass tries to be careful about integer limits, with only one small edge case remaining: As far as I can read the WASM spec, it permits a 64-bit memory to be exactly 2^64 bytes long, yet the pass only handles memories up to 2^64-1 bytes long. But in general, binaryen doesn't seem to be designed in a way that would allow that last byte.
Overall, the pass modifies the module by rewriting its data segments, then modifying data-segment indices in function bodies. In the latter case, I assume that ReFinalize() is not needed, since the instructions and stack arguments are unchanged, only their indices are modified. Similarly, I mark requiresNonNullableLocalFixups() as false. In principle, the behavior of the memory instructions on active data segments can be simplified, but I figure that it's better to leave that to the more extensive modifications of MemoryPacking.
The active-segment threshold of MemoryPacking does not always match the size heuristic of MergeDataSegments, so if run in alternation in certain cases, they could fight over splitting vs. merging the same two segments.
Obviously, this pass has a lot of edge cases that need testing, but I'm not sure where the tests should be placed (test/passes/? test/lit/passes/?), nor how exactly they are formatted. I'm especially unsure how to test behavior around the MAX_SEG_SIZE.

MaxGraey · 2026-04-27T19:21:52Z

It looks like all of this was created using an LLM? You should add tests to demonstrate how this data segments merges + check edge cases using lit tests (test/lit/passes/<some-pass-name>.wast). You also need to run fuzz tests for at least 5-7 hours.

LegionMammal978 · 2026-04-27T19:37:09Z

No, I wrote everything in this PR myself, I'm not a big fan of LLMs' coding style. (Indeed, I asked an LLM to review my merge functions, and it kept wanting to add all sorts of extraneous steps.)

I think I see how to write the lit tests (create the file with the inputs, run scripts/update_lit_tests.py, and double-check that the outputs are correct). But how do I run the fuzz tests?

MaxGraey · 2026-04-27T19:41:25Z

But how do I run the fuzz tests?

run from root dir:

./scripts/fuzz_opt.py

LegionMammal978 added 2 commits April 27, 2026 11:38

Add MergeDataSegments pass

ced6d60

Add MergeDataSegments to README.md

d7d54d5

LegionMammal978 requested a review from a team as a code owner April 27, 2026 16:30

LegionMammal978 requested review from kripken and removed request for a team April 27, 2026 16:30

LegionMammal978 added 2 commits April 27, 2026 12:59

Fix signedness issues

ec9957c

Update lit/help tests

64952b5

LegionMammal978 force-pushed the merge-data-segments branch from 4a120d6 to 64952b5 Compare April 27, 2026 17:40

LegionMammal978 marked this pull request as draft April 27, 2026 20:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MergeDataSegments pass#8647

Add MergeDataSegments pass#8647
LegionMammal978 wants to merge 4 commits intoWebAssembly:mainfrom
LegionMammal978:merge-data-segments

LegionMammal978 commented Apr 27, 2026 •

edited

Loading

Uh oh!

MaxGraey commented Apr 27, 2026 •

edited

Loading

Uh oh!

LegionMammal978 commented Apr 27, 2026

Uh oh!

MaxGraey commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LegionMammal978 commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaxGraey commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LegionMammal978 commented Apr 27, 2026

Uh oh!

MaxGraey commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LegionMammal978 commented Apr 27, 2026 •

edited

Loading

MaxGraey commented Apr 27, 2026 •

edited

Loading