Skip to content

Conversation

@kripken
Copy link
Member

@kripken kripken commented Nov 13, 2025

Each time we see a new call_indirect form (a new combo of table + type that
is called), we scan the table to see which functions might be called. In a large
Dart testcase I am looking at, we have 500K items in a table (!), and the
constant scanning of items in the segment into RefFuncs and then looking up
their functions on the module is by far the slowest part of the pass. Reduce that
overhead by precomputing a "flat", friendly form of segments, which has the
function name and type on each element segment item.

This makes the slowest pass on that testcase over 2x faster, from 16.05 to 6.85
seconds. The pass runs 3 times in -O3 so this lets us save almost 30
seconds from the total time of 184, making us overall 15% faster.

After this, the slowest thing is HeapType::isSubType, which might also be
worth optimizing (but harder).

@kripken kripken requested a review from tlively November 13, 2025 22:48
@tlively
Copy link
Member

tlively commented Nov 13, 2025

After this, the slowest thing is HeapType::isSubType, which might also be
worth optimizing (but harder).

Fascinating! We could absolutely make this faster by using the same constant-time cast implementation used in real engines.

Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be even faster to build maps from table and type to referenced function? The only trick would be having to handle subtyping by either adding functions to the maps for all its supertypes or by linking supertypes to their present subtypes in each table.

@kripken
Copy link
Member Author

kripken commented Nov 14, 2025

Would it be even faster to build maps from table and type to referenced function?

We also need to know which segment is referred to, see the elemReferenced variable. That means we need to track elems too.

@tlively
Copy link
Member

tlively commented Nov 14, 2025

That doesn't seem intractable, but let's not block landing this on further experimentation.

@kripken
Copy link
Member Author

kripken commented Nov 14, 2025

Oh, about maps from table and type to referenced function - all we need is the function name and type, so that is what this PR builds. If we just referred to the function, we'd need another lookup, which would be slower I think.

We can experiment with more stuff here, but the by-far largest part of the pass after this PR is already something else, the HeapType subtyping checks. We could either optimize those as VMs do, as you said, or perhaps build a subtyping tree (so to find out which functions a call to type T can reach, we'd traverse node T and all children, and process all functions listed there). That would be a bit of work, though, but if it makes the pass 30% faster (my general guess) it could save up to 3-4% of total time.

@tlively
Copy link
Member

tlively commented Nov 14, 2025

Oh, about maps from table and type to referenced function - all we need is the function name and type, so that is what this PR builds. If we just referred to the function, we'd need another lookup, which would be slower I think.

Right, I was thinking about mapping the type to just the names. That would avoid the linear scan to find subtypes. But point taken about this no longer being a bottleneck.

@kripken kripken merged commit 253aab4 into WebAssembly:main Nov 14, 2025
17 checks passed
@kripken kripken deleted the rume.elem.flat.fast branch November 14, 2025 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants