Add MSVC demangler correctness fixes and generated test corpus#8140
Add MSVC demangler correctness fixes and generated test corpus#8140plafosse wants to merge 2 commits into
Conversation
4b906ea to
15dd98a
Compare
1cd85f5 to
02cebd9
Compare
Split MSVC symbol parsing from finalization by carrying the parsed qualified name in DemangleContext instead of using mutable demangler-wide name state. Function parsing now returns a DemangledFunction with the parsed DemangledTypeNode plus optional decoded thunk adjustor metadata, and the symbol-context callers finalize name-dependent work such as thunk suffixes and implicit this parameter synthesis. Expand DemangledTypeNode into the shared representation used by the MSVC path for delayed type construction, platform-aware widths, calling convention resolution, member pointers, postfix forms, and explicit implicit-this parameters. This lets MSVC parsing produce structured type/name data first and defer Binary Ninja Type construction until finalization has the platform and view context. Also route MSVC string literals, raw type-info names, vtables, RTTI, dynamic init/fini stubs, local guards, and referenced-symbol template values through explicit DemangledTypeNode/name flow. Extensively tested with the MSVC demangler unit suite and a 199,500-symbol regression corpus with zero output changes or failures.
Update the GNU3 parser to use shared DemangledTypeNode references for substitutions, template substitutions, and nested type/name construction so backrefs preserve structure instead of copying stale formatted strings. Fix template argument parsing for non-type template parameter declarations, expression arguments, argument packs, and generic lambda auto parameters while preserving enclosing template substitution state. Carry platform context through GNU3 demangling/finalization so rendered and finalized types use the same delayed DemangledTypeNode representation as the other demangler paths.
02cebd9 to
7862d28
Compare
| #endif | ||
|
|
||
| #ifdef BINARYNINJACORE_LIBRARY | ||
| #include "demangler/gnu3/demangled_type_node.h" |
There was a problem hiding this comment.
The MS demangler plugin shouldn't be importing these headers from the GNU plugin, but instead these headers should live in a common location such that both plugins (and any future/user plugins) can load them. These headers and classes should probably be moved to api/base or some other central location in the repo
There was a problem hiding this comment.
Agreed we didn't catch that
| #endif | ||
|
|
||
|
|
||
| #define MAX_DEMANGLE_LENGTH 262144 |
There was a problem hiding this comment.
Any reason to remove the max length guard? I know the max nesting depth guard was added but that doesn't seem like it replaces this, and maximum length is a useful guardrail to have.
There was a problem hiding this comment.
We would need to stringify just to check the length this was a huge source of slow down in the old version
There was a problem hiding this comment.
Good point. It may still be worth having a length size check on individual nodes though. I'm not sure if the mangle format lets you specify token length in a way that would cause problems, but I wouldn't put it past it.
There was a problem hiding this comment.
Yeah, it might be worth it having a maximum node length
This should be considered a ground up re-write and so you probably won't get any value out of looking at diffs of the actual demangler. It pretty much solves all known demangler accuracy issues. And lays the ground work for more performance unlocks when we get the new simplifier integrated.
Major changes:
to provide an abstraction layer between c++ features and binary
ninja's type objects.
this is due to cutting down on extraneous string copies and type object allocations
of back references.
I didn't feel that adding all those commits would be helpful in understanding what's
actually going on here.
Fixes: