-
-
Notifications
You must be signed in to change notification settings - Fork 34.1k
Description
Proposal:
There's room for tiny refactorings in the Objects/dictobject.c that may allow speedups, for example via better CPU cache behavior, compiler hints, cheap checks & fast exits for common cases. In my guess-timation, they may bring about upto 10% overall lookup/insert performance gains. That's more of an un-scientific intuition at best right now, but I'm laying some ideas out here:
-
compare_unicode_unicode(): can benefit from pointer-equality and branch prediction, and reducing unnecessary function calls -
compare_generic(): can use early rejections on hash-mismatch, avoidingRichCompareBoolcalls -
unicodekeys_lookup_unicode(): pointer-equality for the common case can precededo_lookupwhich today is invoked everytime for all cases -
find_empty_slot(): early termination / separate path for first iteration can speed up insertions, especially for lightly-populated dicts -
build_indices_generic / build_indices_unicode (): breaking out the first-probe success case may speed up dict resize operations
I welcome feedback on how to raise this proposal differently, but otherwise I'd like to work on these ideas and quantify them. I feel modern CPUs have deep pipelines and these optimizations may help us avoid some cache-thrashing. These changes are purely performance oriented and don't change functionality / API / ABI and should be thread-safe too.
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response