[mypyc] Specialize `s[i] == 'x'` to a codepoint int compare by VaggelisD · Pull Request #21579 · python/mypy

VaggelisD · 2026-06-02T16:41:21Z

7th PR of #21418

Lowers s[i] == 'x' (and the symmetric == / != forms) down to a bounds-checked codepoint read + int compare, instead of CPyStr_GetItem + CPyStr_EqualLiteral which (may) allocate a 1-character PyUnicode per iteration. No annotations are required for this optimization.

On microbenchmarks (1-compare-per-iter hot loop, ~2.5M-codepoint SQL-like string) the comparison is ~3.6x times faster.

Some follow up optimizations that might be worth it I can work on:

In operator e.g s[i] in ('a', 'b', 'c') --> Fuse to one check with N int comparisons
Comparison operators e.g s[i] < 'x' --> Need to expand the op set
s[i] == s[j] — both sides IndexExpr(str) --> Need drop the literal-only guard

Recognizes the AST shape `IndexExpr(str) == StrLiteral` (and the symmetric `StrLiteral == IndexExpr(str)`, plus the `!=` variants) and lowers it to an int compare of codepoints reusing the existing CPyStr_GetItemUnsafeAsInt primitive. Today the pattern lowers to CPyStr_GetItem + CPyStr_EqualLiteral, which allocates or looks up a 1-character PyUnicode object per iteration and goes through a generic string-equality call. After specialization it becomes an inlined PyUnicode_READ plus an int compare -- about 4x faster on bench_str_compare with a 3-compares-per-iteration workload, and closer to ~9x with the more typical 1-compare-per-iteration shape. No annotations required; benefits any code that compares a string index against a 1-character literal. Multi-character / empty literals fall through to the generic path (which still correctly returns False). Bounds checking is preserved -- the helper raises IndexError for out-of-range indices, same as the unspecialized path. Stack: builds on the `ord(s[i])` primitive (python#20578) and the librt.strings codepoint helpers (python#21462, python#21504, python#21509, python#21521, python#21522, python#21553).

p-sawicki · 2026-06-03T16:46:59Z

+    # Going through `Any` routes through the interpreted wrapper, which
+    # uses the unspecialized lowering. Confirms the str surface still
+    # works for callers that bypass the specializer.
+    f: Any = eq_comma
+    assert f("hello,world", 5) is True
+    assert f("hello", 0) is False


i don't think the comment is true, the interpreted wrapper still calls the same generated C function for eq_comma that has the optimization.

to test unspecialized lowering you could add a test case that compares against a one-char str passed as a parameter instead of a literal. i'd imagine we have tests like that already though so i think you could just remove this test case.

p-sawicki · 2026-06-03T17:31:44Z

+    if isinstance(rhs, IndexExpr) and not isinstance(lhs, IndexExpr):
+        lhs, rhs = rhs, lhs


i think the errors in the run tests are because of a mypy issue #21586 as it seems rhs is typed as IndexExpr after the swap and assigning lhs to it raises a type error.

you might need to use a temp variable as a work-around as this way it seems to work correctly.

tmp = lhs lhs, rhs = rhs, tmp

p-sawicki reviewed Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mypyc] Specialize `s[i] == 'x'` to a codepoint int compare#21579

[mypyc] Specialize `s[i] == 'x'` to a codepoint int compare#21579
VaggelisD wants to merge 1 commit into
python:masterfrom
VaggelisD:str-index-compare-specialize

VaggelisD commented Jun 2, 2026

Uh oh!

p-sawicki Jun 3, 2026

Uh oh!

p-sawicki Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if isinstance(rhs, IndexExpr) and not isinstance(lhs, IndexExpr):
		lhs, rhs = rhs, lhs

Uh oh!

Conversation

VaggelisD commented Jun 2, 2026

Uh oh!

p-sawicki Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

p-sawicki Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants