[wasm-reduce] Empty functions with delta debugging#8640
[wasm-reduce] Empty functions with delta debugging#8640
Conversation
Delta debugging is an algorithm for finding the minimal set of items necessary to preserve a condition. It generally works by using increasingly fine partitions of the orignal set of items and alternating trying to keep just one of the partitions to make rapid progress and trying to keep the complement of one of the partitions to make smaller changes that are more likely to work. Add a header containing a templatized delta debugging implementation, then use it in wasm-reduce to preserve the minimal number of function bodies necessary to reproduce the reduction condition. This should allow wasm-reduce to make much faster progress on emptying out functions in the common case and leave it much less work to do afterwards. Using delta debugging for deleting functions and performing other reduction operations is left as future work. Deleting functions in particular is challenging because it can involve reloading the module from the working file, potentially changing function names and invalidating the function names that would be stored in the delta debugging partitions.
|
Currently validating this approach overnight by reducing a 200MB file with a reduction script that takes over four minutes to crash. Let's see how far it gets by the morning! |
| [&](Index partitionIndex, | ||
| Index numPartitions, | ||
| const std::vector<Index>& partition) { | ||
| std::cerr << "| try partition " << partitionIndex + 1 << " / " |
There was a problem hiding this comment.
Printing 1-based indices is slightly more intuitive than 0-based indices.
kripken
left a comment
There was a problem hiding this comment.
Nice! lgtm % you find it is faster
|
In practice I had to do additional hacks to make sure this ran before the usual destructive I'd like to make two changes here before landing this:
|
|
@kripken PTAL at the latest changes. I would be happy landing this version. On the 200MB binary, this quickly reduces it to just 2MB (by removing all of the function bodies), but the reducer fails to make quick progress after that. I will add more uses of delta debugging as follow-on work to hopefully make more progress. |
|
How does the speed of reducing functions compare to before this PR? (previous code uses exponential growth, so I'm curious) |
kripken
left a comment
There was a problem hiding this comment.
lgtm %
- With the ordering part landed separately (see comment)
- If it is significantly faster (seems worth the TODOs in the code, in that case)
| reducer.loadWorking(); | ||
| reducer.reduceFunctionBodies(); | ||
| first = false; | ||
| } |
There was a problem hiding this comment.
How about landing this part first? (also, measurement should be independent of it)
|
I updated the old code to also remove function bodies before doing anything else, using the existing algorithm. I only modified it so that it would not try removing functions as well in that first pass. That initial removing function bodies step took 2 hours, 7 minutes. It started out by removing 1, 2, 4, ... functions bodies, but the step size capped out at about 31k for some reason. After it had removed almost all the function bodies that way, it spent a huge amount of time continuing to try to remove one function at a time, and at one point found some work to do and worked back up to a step size of 8192. In contrast, the new code does this in under 5 minutes in a single step that removes all the function bodies at once. If I remove the code that tries to remove everything at once, it instead uses a logarithmic number of steps and takes 1 hour, 21 minutes (which would be more realistic for a crash that required keeping some function bodies). I think the new code being so much faster (and making so much more progress up front) is a large part of the reason why it now makes sense to remove function bodies before doing anything else, so I would suggest not splitting that part out of this PR. |
Delta debugging is an algorithm for finding the minimal set of items necessary to preserve a condition. It generally works by using increasingly fine partitions of the orignal set of items and alternating trying to keep just one of the partitions to make rapid progress and trying to keep the complement of one of the partitions to make smaller changes that are more likely to work.
Add a header containing a templatized delta debugging implementation, then use it in wasm-reduce to preserve the minimal number of function bodies necessary to reproduce the reduction condition. This should allow wasm-reduce to make much faster progress on emptying out functions in the common case and leave it much less work to do afterwards.
Using delta debugging for deleting functions and performing other reduction operations is left as future work. Deleting functions in particular is challenging because it can involve reloading the module from the working file, potentially changing function names and invalidating the function names that would be stored in the delta debugging partitions.