-
Notifications
You must be signed in to change notification settings - Fork 66
perf(tidy3d): FXC-3721 Speed up test suite #2991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
14 files reviewed, 7 comments
dd55b29 to
0d5292d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
14 files reviewed, 1 comment
700ecb3 to
693676d
Compare
Diff CoverageDiff: origin/develop...HEAD, staged and unstaged changesNo lines with coverage information in this diff. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering a bit what benefit this provides over just running pytest --durations > times.txt?
| N_tests = 2 # may be increased temporarily, makes it slow for routine tests | ||
| max_structures = np.log10(2) # may be increased temporarily, makes it slow for routine tests | ||
|
|
||
| # adjust as needed, keeping small to speed tests up | ||
| num_structures = np.logspace(0, 2, N_tests).astype(int) | ||
| num_structures = np.logspace(0, max_structures, N_tests).astype(int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this turns the test in this file into a smoke-test essentially. i believe the purpose there was to catch regressions related to validation time where the serialization scales poorly as structure count grows. now we basically just assert that both cases work. but to be fair the old test didnt actually assert any regressions so i'm not really sure what makes sense here, maybe @momchil-flex has an opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made for now a "perf" marker and deselected for regular tests
5bbca11 to
3f326b4
Compare
3f326b4 to
9c5ff18
Compare
9c5ff18 to
8e7ff17
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
14 files reviewed, 1 comment
9bef6f0 to
f310e32
Compare
f310e32 to
8c9050b
Compare
As our test suite does have a quite long runtime, this PR addresses the identification of the most time-intensive modules/functions/runs and speeds up the most critical ones.
profile_pytest.pyin scripts to analyze the most time-consuming modules/functions/runs (see report below)pytest.mark.slowfor the most time-consuming tests. May be used later for skipping for faster test modes.test_autograd.pyas the most critical module (10 min raw runtime!) -> cut it down by roughly the half. Made sure these tests run in the beginning such that we avoid non-fully-parallelized works are on them at the end.I'm open for more inputs how to speedup the tests.
Analysis before PR:
Analysis after PR
Greptile Overview
Greptile Summary
This PR successfully optimizes the test suite runtime by ~30% (from 195s to 137s total, with
test_autograd.pyreduced from 635s to 330s) through strategic refactoring:Major Changes:
profile_pytest.pyscript to identify and track slow tests@pytest.mark.slowand@pytest.mark.perfmarkers for better test categorizationpytest-orderdependency to run heavy autograd tests first (avoiding non-parallelized tail execution)MAX_NUM_MEDIUMSconstant from production values to 3 in testsTrade-offs:
The optimizations reduce some test coverage (e.g.,
test_multi_frequency_equivalencenow only tests one structure type instead of 12, async tests run 2 structure/monitor combinations instead of 16×2=32), but this is reasonable since the core functionality is validated through equivalence checks and the reduced coverage focuses on structural diversity rather than functional correctness.Confidence Score: 4/5
Important Files Changed
File Analysis
perf,slow) - proper pytest configuration updates@pytest.mark.perf@pytest.mark.slowSequence Diagram
sequenceDiagram participant Dev as Developer participant Script as profile_pytest.py participant PyTest as pytest participant Tests as Test Suite Dev->>Script: Run profiling script Script->>PyTest: Execute with --durations flag PyTest->>Tests: Run test collection Note over Tests: Tests run with optimizations:<br/>- Reduced parameterizations<br/>- Batched async operations<br/>- Marked slow tests<br/>- Reduced iteration counts Tests-->>PyTest: Test results + timing data PyTest-->>Script: Duration statistics Script->>Script: Parse & aggregate durations Script->>Script: Generate report by file & test Script-->>Dev: Performance report (pytest_profile_stats.txt) Note over Dev: 30% faster test suite<br/>635s → 330s for autograd tests