Reduce TypeForm recognition slowdown by davidfstr · Pull Request #21585 · python/mypy

davidfstr · 2026-06-03T17:14:42Z

References #21262

Summary

Enabling TypeForm by default (#21262) regressed
mypy's self-check by ~2% on mypyc-compiled builds.

Details

The regression came entirely
from SemanticAnalyzer.try_parse_as_type_expression, which is now invoked eagerly
on ~2.84M expressions per self-check. Of those, ~3.3% reached the expensive
full-parse block (expr_to_analyzed_type + isolated_error_analysis), and ~91% of
those full parses failed — pure wasted work.

This branch adds a series of fast-rejection filters to the offending function (SemanticAnalyzer.try_parse_as_type_expression), eliminating
~83% of the full-parse attempts (3,144 → 542) on mypy's self-check, with zero
correctness regressions. End-to-end this recovers ~43% of the regression by paired
median (21.6 ms of a 50.2 ms CPU-time regression, n=100, CI ±5 ms; consistent with
~34–40% on wall-clock).
This branch also adds additional instrumentation to analyze what kinds of
expressions pass through the fast-rejection filters and make it to a full-parse.
This instrumentation is enabled via the MYPY_TYPEFORM_PROFILE_FULL_PARSE
variable and outputs a .tsv file that can be processed using the new
misc/analyze_typeform_full_parse_profile.py script to classify & aggregate
expressions.
Finally this branch extends misc/perf_compare.py with additional reporting
options to reduce measured variance.

Each commit on the PR branch has a detailed message giving more information about the specific changes made.

Optimization Results

CPU time — canonical; lowest-variance metric

python misc/perf_compare.py --warmup-runs 3 --num-runs 100 -j 3 \
    --metric cpu --workers1 \
    022d9bc96 16fef2515 HEAD

Commit	Mean	Median	Δ vs baseline (paired median ±95% CI)
`022d9bc96` baseline	2.768 s	2.757 s	—
`16fef2515` regression	2.817 s	2.809 s	+50.2 ms ±5.2 (+1.82%)
`3f39cd753/HEAD` branch (all filters)	2.797 s	2.789 s	+28.6 ms ±5.5 (+1.04%)

Details

CPU time (user+sys) on a single worker is the lowest-variance estimator here (CV ≈ 3.3%, vs ≈ 7.5% for wall-clock at the same n) and the truest measure of the per-call work being attacked: with one worker the whole self-check runs serially, so all ~2.84M calls land in one CPU-time figure rather than being spread across workers and hidden behind the slowest one.

The branch recovers 21.6 ms of the 50.2 ms regression — ~43% by paired median
(~40% by trimmed mean), leaving +28.6 ms (~57%, still well outside the ±5 ms CI).
The recovered fraction is the difference of two deltas measured against the same baseline
within one interleaved run, so it inherits the run's low noise (a conservative
independent-error bound is ≈ ±7.6 ms on the 21.6 ms numerator; the true band is tighter
because the two deltas are positively correlated — same baseline, same per-round machine
state).

Wall-clock — default user-facing metric

python misc/perf_compare.py --warmup-runs 3 --num-runs 100 -j 3 \
    022d9bc96 16fef2515 HEAD
python misc/perf_compare.py --warmup-runs 3 --num-runs 400 -j 3 \
    022d9bc96 16fef2515 HEAD

Run	Commit	Median	Δ vs baseline (paired median ±95% CI)
n=100	`022d9bc96` baseline	1.342 s	—
	`16fef2515` regression	1.369 s	+27.2 ms ±6.7 (+2.03%)
	`3f39cd753` branch	1.357 s	+16.3 ms ±6.0 (+1.21%)
n=400	`022d9bc96` baseline	1.351 s	—
	`16fef2515` regression	1.369 s	+17.6 ms ±3.3 (+1.30%)
	`3f39cd753` branch	1.362 s	+11.6 ms ±3.0 (+0.86%)

Details

Wall-clock recovery is **~40% at n=100** (10.9 ms of 27.2 ms) and **~34% at n=400** (6.0 ms of 17.6 ms) by paired median — consistent with the CPU figure once you allow for wall-clock's higher variance and the fact that it is bounded by the slowest worker. The recovered fraction (~34–43% across metrics and run lengths) stays stable even though the absolute regression differs a lot between metrics (≈50 ms CPU vs ≈18–27 ms wall) — the reassuring sign that the recovery is real rather than a noise artifact.

Using the new full-parse instrumentation

Details

# 1. Run instrumented self-check on this branch:
MYPY_TYPEFORM_PROFILE_FULL_PARSE=/tmp/tf.log \
    python -m mypy --config-file mypy_self_check.ini -p mypy --no-incremental

# 2. Aggregate the profile:
python misc/analyze_typeform_full_parse_profile.py --top 20 /tmp/tf.log.*

Open Questions

Is the MYPY_TYPEFORM_PROFILE_FULL_PARSE env-var name acceptable, or should
it follow an existing naming convention?
I did not run performance numbers against non-mypy repositories
(like PyTorch and Black) as I originally planned. Would you like me to?
Should the misc/perf_compare.py changes ship in this PR, or land separately?
They are a general benchmarking-harness improvement (CPU metric, single-worker
mode, median-based reporting, opt-in binary cache), independent of the TypeForm
filters — splitting them into their own PR may be cleaner for review.

ilevkivskyi · 2026-06-03T17:23:57Z

@davidfstr Your original PR was never reverted in master, only in release branch. Hopefully you can still cherry-pick the 4 relevant commits starting from current master.

Specifically: * Median is reported, in addition to the existing mean+stdev, which is significantly more resistant to skew by outliers. * --metric {wall,cpu} (default wall): Enables profiling using CPU time rather than wall-clock time. CPU profiling has roughly half the coefficient of variation as wall-clock profiling equal run count. * --workers1: Forces MYPY_NUM_WORKERS=1 (rather than the default 4) to cut CPU scheduling variance. Strongly recommended when using --metric cpu. * --warmup-runs N (default 1): Configurable number of leading cold runs to discard. Previously was always 1. Higher run counts decrease outliers that skew the reported mean. * A new "Paired deltas vs <first commit>" section is added to the report, showing per-round paired differencing against the first commit to cancel round-level common-mode noise, reducing variance. Reported as median +/-95% CI. Also: * --cache-binaries (default false): Caches each commit's compiled clone to avoid ~5min recompile whenever comparing the same commit multiple times. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…_parse_as_type_expression() Specifically: - If you set MYPY_TYPEFORM_PROFILE_FULL_PARSE environment variable, mypy will output a .tsv to that filepath which characterizes the kinds of Expressions that try_parse_as_type_expression() in semanal.py was forced to do a full parse of, which was not rejected early. - A misc/analyze_typeform_full_parse_profile.py script is added which takes those .tsvs and prints an expression-time summary (by total time) plus top-N descriptors per FAIL class. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…s_type_expression() These filters reduce the mypy's wall clock slowdown when checking the mypy codebase after the introduction of TypeForm from +2.03% to +1.21%, when using `misc/perf_compare.py` to profile. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

davidfstr · 2026-06-03T17:40:38Z

Hopefully you can still cherry-pick the 4 relevant commits starting from current master.

Done. Now there are only 3 commits. The 4th one reapplied enabling TypeForm by default, which is already on master

…type

…e on Windows

for more information, see https://pre-commit.ci

github-actions · 2026-06-03T23:39:10Z

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

davidfstr mentioned this pull request Jun 3, 2026

TypeForm: Enable by default #21262

Merged

davidfstr and others added 3 commits June 3, 2026 13:30

davidfstr force-pushed the f/typeform_complete--take2 branch from a08cc55 to 6fba903 Compare June 3, 2026 17:37

davidfstr changed the title ~~Reduce TypeForm recognition slowdown. Enable TypeForm by default.~~ Reduce TypeForm recognition slowdown Jun 3, 2026

This comment has been minimized.

Sign in to view

SQUISH -> misc/perf_compare.py -- Workaround inability to return Any …

978e9b4

…type

This comment has been minimized.

Sign in to view

davidfstr and others added 4 commits June 3, 2026 18:58

SQUISH -> misc/perf_compare.py -- Workaround missing 'resource' modul…

5a03b13

…e on Windows

[pre-commit.ci] auto fixes from pre-commit.com hooks

91052ef

for more information, see https://pre-commit.ci

SQUISH -> misc/perf_compare.py -- typechecker fixes

2b804d6

SQUISH -> misc/perf_compare.py -- typechecker fixes, take 2

12f9335

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce TypeForm recognition slowdown#21585

Reduce TypeForm recognition slowdown#21585
davidfstr wants to merge 8 commits into
python:masterfrom
davidfstr:f/typeform_complete--take2

davidfstr commented Jun 3, 2026 •

edited

Loading

Uh oh!

ilevkivskyi commented Jun 3, 2026

Uh oh!

davidfstr commented Jun 3, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

davidfstr commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Optimization Results

CPU time — canonical; lowest-variance metric

Wall-clock — default user-facing metric

Using the new full-parse instrumentation

Open Questions

Uh oh!

ilevkivskyi commented Jun 3, 2026

Uh oh!

davidfstr commented Jun 3, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

davidfstr commented Jun 3, 2026 •

edited

Loading