Skip to content

[mypyc] Specialize s[i] == 'x' to a codepoint int compare#21579

Open
VaggelisD wants to merge 1 commit into
python:masterfrom
VaggelisD:str-index-compare-specialize
Open

[mypyc] Specialize s[i] == 'x' to a codepoint int compare#21579
VaggelisD wants to merge 1 commit into
python:masterfrom
VaggelisD:str-index-compare-specialize

Conversation

@VaggelisD
Copy link
Copy Markdown
Contributor

7th PR of #21418

Lowers s[i] == 'x' (and the symmetric == / != forms) down to a bounds-checked codepoint read + int compare, instead of CPyStr_GetItem + CPyStr_EqualLiteral which (may) allocate a 1-character PyUnicode per iteration. No annotations are required for this optimization.

On microbenchmarks (1-compare-per-iter hot loop, ~2.5M-codepoint SQL-like string) the comparison is ~3.6x times faster.


Some follow up optimizations that might be worth it I can work on:

  • In operator e.g s[i] in ('a', 'b', 'c') --> Fuse to one check with N int comparisons
  • Comparison operators e.g s[i] < 'x' --> Need to expand the op set
  • s[i] == s[j] — both sides IndexExpr(str) --> Need drop the literal-only guard

Recognizes the AST shape `IndexExpr(str) == StrLiteral` (and the symmetric
`StrLiteral == IndexExpr(str)`, plus the `!=` variants) and lowers it to
an int compare of codepoints reusing the existing CPyStr_GetItemUnsafeAsInt
primitive.

Today the pattern lowers to CPyStr_GetItem + CPyStr_EqualLiteral, which
allocates or looks up a 1-character PyUnicode object per iteration and
goes through a generic string-equality call. After specialization it
becomes an inlined PyUnicode_READ plus an int compare -- about 4x faster
on bench_str_compare with a 3-compares-per-iteration workload, and closer
to ~9x with the more typical 1-compare-per-iteration shape.

No annotations required; benefits any code that compares a string index
against a 1-character literal. Multi-character / empty literals fall
through to the generic path (which still correctly returns False).
Bounds checking is preserved -- the helper raises IndexError for
out-of-range indices, same as the unspecialized path.

Stack: builds on the `ord(s[i])` primitive (python#20578) and the librt.strings
codepoint helpers (python#21462, python#21504, python#21509, python#21521, python#21522, python#21553).
Comment on lines +1480 to +1485
# Going through `Any` routes through the interpreted wrapper, which
# uses the unspecialized lowering. Confirms the str surface still
# works for callers that bypass the specializer.
f: Any = eq_comma
assert f("hello,world", 5) is True
assert f("hello", 0) is False
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think the comment is true, the interpreted wrapper still calls the same generated C function for eq_comma that has the optimization.

to test unspecialized lowering you could add a test case that compares against a one-char str passed as a parameter instead of a literal. i'd imagine we have tests like that already though so i think you could just remove this test case.

Comment on lines +994 to +995
if isinstance(rhs, IndexExpr) and not isinstance(lhs, IndexExpr):
lhs, rhs = rhs, lhs
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the errors in the run tests are because of a mypy issue #21586 as it seems rhs is typed as IndexExpr after the swap and assigning lhs to it raises a type error.

you might need to use a temp variable as a work-around as this way it seems to work correctly.

tmp = lhs
lhs, rhs = rhs, tmp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants