Skip to content

gh-93714 Create fast version of match_keys for exact dict#93752

Open
da-woods wants to merge 2 commits into
python:mainfrom
da-woods:fast-match-keys
Open

gh-93714 Create fast version of match_keys for exact dict#93752
da-woods wants to merge 2 commits into
python:mainfrom
da-woods:fast-match-keys

Conversation

@da-woods

Copy link
Copy Markdown
Contributor

When the type is an exact dict, there's a real speed-up to be gained by skipping the call to "get" and using PyDict_GetItemWithError instead. The implementation is just a slightly modified copy of the existing match_keys function.

When the type is an exact dict, there's a real speed-up to be
gained by skipping the call to "get" and using PyDict_GetItemWithError
instead.
@AA-Turner AA-Turner added performance Performance or resource usage interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Jun 12, 2022
@sweeneyde

Copy link
Copy Markdown
Member

Do you have any benchmark results to prove whether this is worth it?

@da-woods

da-woods commented Jun 13, 2022

Copy link
Copy Markdown
Contributor Author

I'd made a set of microbenchmarks for pattern matching. This change obviously only affects the first test subject (the dict). I've also included the second subject (an inexact dict) below for comparison.

  • Without:
subject {'a': 'xxx', 1: 500} - time (μs): 1.29 1.28 1.28 1.29 1.28
subject D({'a': 'xxx', 1: 500}) - time (μs): 1.29 1.28 1.29 1.28 1.29
...
  • With:
subject {'a': 'xxx', 1: 500} - time (μs): 1.05 1.04 1.05 1.04 1.06
subject D({'a': 'xxx', 1: 500}) - time (μs): 1.33 1.27 1.27 1.27 1.27
...

It's about a 20% speed up here. If I increase the number of keys to be tested so that the subject is {'a': 'xxx', 1: 500, 'b': 1, 'c': 2, 'd': 3} and the pattern is {"a": "xxx", A.b: x, "b": _, "c": _, "d": _, **extra} then the speed-up remains about 0.2 μs per call, suggesting that it's the constant work of getting subject.get and setting up the dummy object that's being eliminated.

So the benefit is probably "real, but not huge".

(I should add: I'm just compiling Python with make and no special C flags. The Python 3.10 version pre-installed on my system is notably quicker than both versions here, so I suspect I'm not getting the right compiler flags to really test it)

@da-woods

Copy link
Copy Markdown
Contributor Author

With --enable-optimizations --with-ltoI get:

With this change

subject {'a': 'xxx', 1: 500} - time (μs): 0.506 0.505 0.506 0.504 0.506
subject D({'a': 'xxx', 1: 500}) - time (μs): 0.593 0.596 0.594 0.594 0.597

and without

subject {'a': 'xxx', 1: 500} - time (μs): 0.569 0.574 0.574 0.577 0.578
subject D({'a': 'xxx', 1: 500}) - time (μs): 0.578 0.58 0.585 0.583 0.584

So a fairly similar level of speed-up.

@markshannon

Copy link
Copy Markdown
Member

I think this is heading in the wrong direction.
We should be improving the performance of pattern matching by producing better bytecode, not by making the interpreter more complex.

@da-woods

Copy link
Copy Markdown
Contributor Author

I think this is heading in the wrong direction. We should be improving the performance of pattern matching by producing better bytecode, not by making the interpreter more complex.

I agree at least in principle - I known @brandtbucher said he'd done some work on that. I was mainly proposing this PR as a fairly simple short-term improvement. But I'd ultimately expect it to be replaced with bytecode.

Please do reject it if you don't think the extra code is worth the improvement though.

@markshannon

Copy link
Copy Markdown
Member

I won't close this just yet.
We might choose to use it as a temporary measure, until the compiler generates better code for matching mappings.
@brandtbucher thoughts?

@github-actions

Copy link
Copy Markdown

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions Bot added the stale Stale PR or inactive for long period of time. label Apr 11, 2026
@eendebakpt

Copy link
Copy Markdown
Contributor

I think this is heading in the wrong direction. We should be improving the performance of pattern matching by producing better bytecode, not by making the interpreter more complex.

There PR has been inactive for almost 3 years now, so maybe time to consider closing or picking up again. I explored a bit, here is a summary:

  • The current PR (once rebased onto main) still gives a 1.20x speedup for matching dicts with few keys (1 to 6)
  • Another optimization is to detect at compile time that the keys are not duplicates (this works for literal keys) . With a dedicated opcode MATCH_KEYS_UNIQUE we can skip the duplicate key check. Gives a 1.22x speedup. In tier2 the changes for this are quite small.
  • Combination of the above gives a 1.6x speedup (that seems large, probably there is some noise in the benchmarks)
  • I implemented Marks idea (or my interpretation of it). When compiling
def f(d):
    match d:
        case {"a": a, "b": b}:
            return a + b
        case _:
            return -1

we can optimize

MATCH_MAPPING
POP_JUMP_IF_FALSE   -> L2
GET_LEN; LOAD_SMALL_INT 2; COMPARE_OP >=; POP_JUMP_IF_FALSE -> L2
LOAD_CONST 1 (('a', 'b'))      ← the keys as a constant tuple
MATCH_KEYS                     ← one C call: builds a *values tuple* (or None)
COPY 1; POP_JUMP_IF_NONE -> L1 ← "did the whole thing match?"
UNPACK_SEQUENCE 2              ← explode the values tuple back onto the stack
STORE_FAST_STORE_FAST (a, b)
POP_TOP; POP_TOP              ← pop keys-tuple + subject

to

MATCH_MAPPING
POP_JUMP_IF_FALSE   -> L3
GET_LEN; LOAD_SMALL_INT 2; COMPARE_OP >=; POP_JUMP_IF_FALSE -> L3
COPY 1; LOAD_CONST 1 ('b'); MATCH_KEY; POP_JUMP_IF_FALSE -> L2   ← key 'b' → value on stack
COPY 2; LOAD_CONST 2 ('a'); MATCH_KEY; POP_JUMP_IF_FALSE -> L1   ← key 'a' → value on stack
STORE_FAST_STORE_FAST (a, b)
POP_TOP                       ← pop subject only

This requires a new tier1 opcode (the MATCH_KEY), but skips the LOAD_CONST, UNPACK_SEQUENCE. This approach is fast for few keys (factor 1.9x), but scales not so good for dozens of keys (but I suspect that case is rare). Branch eendebakpt#63

@markshannon @da-woods Any opinion on which way to go, and on whether this is worth the changes?

@da-woods

da-woods commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

@eendebakpt It sounds like you have up-to-date rebased versions of this + some other worthwhile optimizations. So even if we pick this up we should probably do it from your branch and close this one.

I don't have a view on MATCH_KEY vs MATCH_KEYS.

My personal view is the same as it was 3 years ago - that there are some fairly cheap optimizations here so it's worth doing something. But I'm not sure how much weight that opinion has.

@github-actions github-actions Bot removed the stale Stale PR or inactive for long period of time. label Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting review interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants