-
Notifications
You must be signed in to change notification settings - Fork 829
[NFC] RemoveUnusedModuleElements: Optimize repeated table lookups #8048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fascinating! We could absolutely make this faster by using the same constant-time cast implementation used in real engines. |
tlively
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be even faster to build maps from table and type to referenced function? The only trick would be having to handle subtyping by either adding functions to the maps for all its supertypes or by linking supertypes to their present subtypes in each table.
We also need to know which segment is referred to, see the |
|
That doesn't seem intractable, but let's not block landing this on further experimentation. |
|
Oh, about We can experiment with more stuff here, but the by-far largest part of the pass after this PR is already something else, the HeapType subtyping checks. We could either optimize those as VMs do, as you said, or perhaps build a subtyping tree (so to find out which functions a call to type T can reach, we'd traverse node T and all children, and process all functions listed there). That would be a bit of work, though, but if it makes the pass 30% faster (my general guess) it could save up to 3-4% of total time. |
Right, I was thinking about mapping the type to just the names. That would avoid the linear scan to find subtypes. But point taken about this no longer being a bottleneck. |
Each time we see a new
call_indirectform (a new combo of table + type thatis called), we scan the table to see which functions might be called. In a large
Dart testcase I am looking at, we have 500K items in a table (!), and the
constant scanning of items in the segment into RefFuncs and then looking up
their functions on the module is by far the slowest part of the pass. Reduce that
overhead by precomputing a "flat", friendly form of segments, which has the
function name and type on each element segment item.
This makes the slowest pass on that testcase over 2x faster, from
16.05to6.85seconds. The pass runs 3 times in
-O3so this lets us save almost30seconds from the total time of
184, making us overall 15% faster.After this, the slowest thing is
HeapType::isSubType, which might also beworth optimizing (but harder).