MemoryPacking: Optimize trampled data instead of giving up#8834
Conversation
When active segments overlap, a later segment overwrites ("tramples")
the data of an earlier one. Since active segments are applied in order
during instantiation, before any code can run, only the final contents
of memory are observable, so we can zero out all trampled bytes and let
the normal optimization of zeros remove them.
We only do this when the memory is defined in the module itself: with an
imported memory, an out-of-bounds segment traps mid-instantiation and
the partially-applied state remains visible in the importing module, so
there we keep the existing behavior of not optimizing.
Fixes WebAssembly#3244
Update the existing trampling tests, which asserted that we give up on any overlap, to the new optimized output, and add coverage for: full trampling by a non-zero byte, partial trampling in the middle of a segment, chained trampling, one segment trampling multiple earlier ones, passive segments being unaffected, and the imported-memory case where we still do not optimize.
|
Hi @kripken — this is my first contribution to Binaryen, following up on the |
tlively
left a comment
There was a problem hiding this comment.
Very nice! Thanks for the contribution.
| // Some segments overlap, that is, a later segment tramples the data of | ||
| // an earlier one. If the memory is imported then we cannot optimize | ||
| // here: if a later segment is out of bounds then instantiation traps | ||
| // partway, leaving the data written so far visible in the imported | ||
| // memory (which outlives the failed instantiation), so even trampled | ||
| // data matters. |
There was a problem hiding this comment.
Maybe worth adding a TODO about optimizing anyway if we can check that all the segments after the trampled segment up to and including the trampling segment will be in-bounds for the imported memory.
| ;; CHECK: (data $0 (i32.const 1024) "x") | ||
|
|
||
| ;; CHECK: (data $1 (i32.const 1024) "\00") | ||
| (module |
There was a problem hiding this comment.
Please leave a blank line between each module to help readability.
There was a problem hiding this comment.
Done — added blank lines between the modules here and in memory-packing_zero-filled-memory.wast (2738c38).
| (data (i32.const 1024) "ab") | ||
| (data (i32.const 1026) "cd") | ||
| (data (i32.const 1023) "WXYZ") | ||
| ) |
There was a problem hiding this comment.
Let's also add a test that depends on the merging of the covered regions. For example, with segments covering the following intervals:
A: [3, 8)
B: [1, 2)
C: [0, 5)
When we visit segment A (after visiting C and B), if we didn't correctly merge the covering information for B into the covering information for C, then we would fail to detect the overlap between C and A because the map lookup would find B instead.
There was a problem hiding this comment.
Good idea — added in 2738c38, with segments "abcde"@1027, "B"@1025, "fghij"@1024 matching your A: [3, 8), B: [1, 2), C: [0, 5) shape. Without the merge, the lookup when visiting "abcde" would find the "B" region and miss the overlap with "fghij". The output checks confirm the trampled "ab" is dropped ("cde" remains at 1029).
|
The earlier CI failure was an infrastructure flake, not related to the change: the macos-14 job built and installed successfully but timed out uploading the build artifact ( I pushed an empty commit to retrigger CI, but the new run is waiting on workflow approval. Could a maintainer approve the workflow run when they get a chance? Thanks! |
- Add a TODO about optimizing trampled segments on imported memories when the relevant segments can be proven in-bounds. - Add blank lines between modules in the memory-packing lit tests. - Add a test that depends on merging the covered regions: without merging, looking up the region covering a segment could find a small later segment and miss a larger overlapping one.
What does this PR do?
When active segments overlap, a later segment overwrites ("tramples") the data of an earlier one. #3222 made MemoryPacking check for trampling and give up optimizing on any overlap. Since active segments are applied in order during instantiation, before any code can run, only the final contents of memory are observable — so instead of giving up, we can zero out all trampled bytes and let the pass's normal optimization of zeros remove them. This leaves segment count, offsets, and order untouched, so segment referrers and trap behavior are unaffected.
The one case we still skip (with the existing warning) is an imported memory: there, a later out-of-bounds segment traps mid-instantiation and the partially-written state remains visible in the importing module, which outlives the failed instantiation, so even trampled data matters. A possible follow-up could optimize imported memories too when all segments are provably within the declared minimum size.
Why was this PR needed?
#3244 (filed after the conservative fix in #3222) left a
TODO: optimize in the trampling caseincanOptimize(). Before this change, a module likewas emitted entirely unchanged; now both segments are removed (the final memory contents are all zeros).
Testing
memory-packing_all-features.wastand added cases for: full trampling by a non-zero byte, partial trampling in the middle of a segment, chained trampling, one segment trampling multiple earlier ones, passive segments being unaffected, and the imported-memory case inmemory-packing_zero-filled-memory.wast.check.py, and gtest unit tests pass.wasm-opt --fuzz-exec --memory-packingon 300 randomly generated modules with 2–6 overlapping segments produces identical memory contents before and after the pass.Fixes #3244