Reduce TypeForm recognition slowdown#21585
Open
davidfstr wants to merge 8 commits into
Open
Conversation
Member
|
@davidfstr Your original PR was never reverted in master, only in release branch. Hopefully you can still cherry-pick the 4 relevant commits starting from current master. |
Specifically:
* Median is reported, in addition to the existing mean+stdev, which is
significantly more resistant to skew by outliers.
* --metric {wall,cpu} (default wall): Enables profiling using CPU time
rather than wall-clock time. CPU profiling has roughly half the coefficient
of variation as wall-clock profiling equal run count.
* --workers1: Forces MYPY_NUM_WORKERS=1 (rather than the default 4) to
cut CPU scheduling variance. Strongly recommended when using --metric cpu.
* --warmup-runs N (default 1): Configurable number of leading cold runs to discard.
Previously was always 1. Higher run counts decrease outliers that skew
the reported mean.
* A new "Paired deltas vs <first commit>" section is added to the report,
showing per-round paired differencing against the first commit
to cancel round-level common-mode noise, reducing variance.
Reported as median +/-95% CI.
Also:
* --cache-binaries (default false): Caches each commit's compiled clone
to avoid ~5min recompile whenever comparing the same commit multiple times.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…_parse_as_type_expression() Specifically: - If you set MYPY_TYPEFORM_PROFILE_FULL_PARSE environment variable, mypy will output a .tsv to that filepath which characterizes the kinds of Expressions that try_parse_as_type_expression() in semanal.py was forced to do a full parse of, which was not rejected early. - A misc/analyze_typeform_full_parse_profile.py script is added which takes those .tsvs and prints an expression-time summary (by total time) plus top-N descriptors per FAIL class. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…s_type_expression() These filters reduce the mypy's wall clock slowdown when checking the mypy codebase after the introduction of TypeForm from +2.03% to +1.21%, when using `misc/perf_compare.py` to profile. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
a08cc55 to
6fba903
Compare
Contributor
Author
Done. Now there are only 3 commits. The 4th one reapplied enabling TypeForm by default, which is already on master |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
for more information, see https://pre-commit.ci
Contributor
|
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
References #21262
Summary
Enabling
TypeFormby default (#21262) regressedmypy's self-check by ~2% on mypyc-compiled builds.
Details
The regression came entirely
from
SemanticAnalyzer.try_parse_as_type_expression, which is now invoked eagerlyon ~2.84M expressions per self-check. Of those, ~3.3% reached the expensive
full-parse block (
expr_to_analyzed_type+isolated_error_analysis), and ~91% ofthose full parses failed — pure wasted work.
This branch adds a series of fast-rejection filters to the offending function (
SemanticAnalyzer.try_parse_as_type_expression), eliminating~83% of the full-parse attempts (3,144 → 542) on mypy's self-check, with zero
correctness regressions. End-to-end this recovers ~43% of the regression by paired
median (21.6 ms of a 50.2 ms CPU-time regression, n=100, CI ±5 ms; consistent with
~34–40% on wall-clock).
This branch also adds additional instrumentation to analyze what kinds of
expressions pass through the fast-rejection filters and make it to a full-parse.
This instrumentation is enabled via the MYPY_TYPEFORM_PROFILE_FULL_PARSE
variable and outputs a .tsv file that can be processed using the new
misc/analyze_typeform_full_parse_profile.pyscript to classify & aggregateexpressions.
Finally this branch extends
misc/perf_compare.pywith additional reportingoptions to reduce measured variance.
Each commit on the PR branch has a detailed message giving more information about the specific changes made.
Optimization Results
CPU time — canonical; lowest-variance metric
022d9bc96baseline16fef2515regression3f39cd753/HEADbranch (all filters)Details
CPU time (user+sys) on a single worker is the lowest-variance estimator here (CV ≈ 3.3%, vs ≈ 7.5% for wall-clock at the same n) and the truest measure of the per-call work being attacked: with one worker the whole self-check runs serially, so all ~2.84M calls land in one CPU-time figure rather than being spread across workers and hidden behind the slowest one.The branch recovers 21.6 ms of the 50.2 ms regression — ~43% by paired median
(~40% by trimmed mean), leaving +28.6 ms (~57%, still well outside the ±5 ms CI).
The recovered fraction is the difference of two deltas measured against the same baseline
within one interleaved run, so it inherits the run's low noise (a conservative
independent-error bound is ≈ ±7.6 ms on the 21.6 ms numerator; the true band is tighter
because the two deltas are positively correlated — same baseline, same per-round machine
state).
Wall-clock — default user-facing metric
022d9bc96baseline16fef2515regression3f39cd753branch022d9bc96baseline16fef2515regression3f39cd753branchDetails
Wall-clock recovery is **~40% at n=100** (10.9 ms of 27.2 ms) and **~34% at n=400** (6.0 ms of 17.6 ms) by paired median — consistent with the CPU figure once you allow for wall-clock's higher variance and the fact that it is bounded by the slowest worker. The recovered fraction (~34–43% across metrics and run lengths) stays stable even though the absolute regression differs a lot between metrics (≈50 ms CPU vs ≈18–27 ms wall) — the reassuring sign that the recovery is real rather than a noise artifact.Using the new full-parse instrumentation
Details
Open Questions
Is the
MYPY_TYPEFORM_PROFILE_FULL_PARSEenv-var name acceptable, or shouldit follow an existing naming convention?
I did not run performance numbers against non-mypy repositories
(like PyTorch and Black) as I originally planned. Would you like me to?
Should the
misc/perf_compare.pychanges ship in this PR, or land separately?They are a general benchmarking-harness improvement (CPU metric, single-worker
mode, median-based reporting, opt-in binary cache), independent of the TypeForm
filters — splitting them into their own PR may be cleaner for review.