numba: inline="always" to speedup trivial op compilation by ricardoV94 · Pull Request #2111 · pymc-devs/pytensor

ricardoV94 · 2026-05-03T11:50:29Z

This speedups first time compilation. Numba prefers to compile a single function with may operations instead of our - one function per op - design. Added inline=always to trivial cheap Ops (scalar, dimshuffle, reshape, ...)

Note the speedup in compile with cache=False (False flag). With cache=True we are seeing the dominating effect of compiling something we already saw before, that isn't changed. But the first time compile should now be faster as well.

Before
------------------------------------------------------------------------------------------------------------------------------- benchmark: 13 tests -------------------------------------------------------------------------------------------------------------------------------
Name (time in us)                                                          Min                        Max                       Mean                    StdDev                     Median                       IQR            Outliers           OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_radon_model_call_benchmark[NUMBA-False]                            7.1030 (1.0)             508.6630 (14.56)             7.4571 (1.0)              2.1671 (inf)               7.3640 (1.0)              0.1310 (inf)      573;3968  134,100.2381 (1.0)       57003           1
test_radon_model_call_benchmark[NUMBA-True]                             7.2030 (1.01)             34.9450 (1.0)               7.5947 (1.02)             0.9135 (inf)               7.4540 (1.01)             0.0900 (inf)     1984;2964  131,670.2561 (0.98)      55605           1
test_radon_model_compile_repeatedly_benchmark[NUMBA-True]         254,713.1150 (>1000.0)   5,534,446.4740 (>1000.0)   1,396,182.6770 (>1000.0)  2,320,576.2290 (inf)         257,882.6410 (>1000.0)  1,637,049.1070 (inf)           1;1        0.7162 (0.00)          5           1
test_radon_model_compile_repeatedly_benchmark[NUMBA-False]      5,671,687.5410 (>1000.0)   8,076,519.9060 (>1000.0)   6,710,889.8018 (>1000.0)  1,012,906.0587 (inf)       6,619,624.3880 (>1000.0)  1,725,231.5448 (inf)           2;0        0.1490 (0.00)          5           1
test_radon_model_compile_variants_benchmark[NUMBA-True]        20,028,947.0980 (>1000.0)  20,028,947.0980 (>1000.0)  20,028,947.0980 (>1000.0)          0.0000 (1.0)      20,028,947.0980 (>1000.0)          0.0000 (1.0)           0;0        0.0499 (0.00)          1           1
test_radon_model_compile_variants_benchmark[NUMBA-False]       50,546,677.2310 (>1000.0)  50,546,677.2310 (>1000.0)  50,546,677.2310 (>1000.0)          0.0000 (1.0)      50,546,677.2310 (>1000.0)          0.0000 (1.0)           0;0        0.0198 (0.00)          1           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

After
------------------------------------------------------------------------------------------------------------------------------- benchmark: 13 tests -------------------------------------------------------------------------------------------------------------------------------
Name (time in us)                                                          Min                        Max                       Mean                    StdDev                     Median                       IQR            Outliers           OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_radon_model_call_benchmark[NUMBA-False]                            6.6530 (1.0)              99.5560 (1.15)              7.3356 (1.0)              1.4372 (inf)               7.2330 (1.0)              0.1500 (inf)      862;3896  136,322.0726 (1.0)       49856           1
test_radon_model_call_benchmark[NUMBA-True]                             7.2030 (1.08)          1,034.4090 (11.91)             7.5908 (1.03)             4.4430 (inf)               7.4740 (1.03)             0.1000 (inf)      286;2677  131,738.6749 (0.97)      54813           1
test_radon_model_compile_repeatedly_benchmark[NUMBA-True]         258,900.6500 (>1000.0)   5,994,141.2680 (>1000.0)   1,408,531.9982 (>1000.0)  2,563,436.1926 (inf)         261,233.4620 (>1000.0)  1,440,107.1928 (inf)           1;1        0.7100 (0.00)          5           1
test_radon_model_compile_repeatedly_benchmark[NUMBA-False]      5,084,492.1370 (>1000.0)   7,516,582.0820 (>1000.0)   5,725,177.9268 (>1000.0)  1,010,872.8182 (inf)       5,326,797.6500 (>1000.0)    779,492.7115 (inf)           1;1        0.1747 (0.00)          5           1
test_radon_model_compile_variants_benchmark[NUMBA-True]        20,643,653.0320 (>1000.0)  20,643,653.0320 (>1000.0)  20,643,653.0320 (>1000.0)          0.0000 (1.0)      20,643,653.0320 (>1000.0)          0.0000 (1.0)           0;0        0.0484 (0.00)          1           1
test_radon_model_compile_variants_benchmark[NUMBA-False]       42,041,049.0130 (>1000.0)  42,041,049.0130 (>1000.0)  42,041,049.0130 (>1000.0)          0.0000 (1.0)      42,041,049.0130 (>1000.0)          0.0000 (1.0)           0;0        0.0238 (0.00)          1           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

jessegrabowski · 2026-05-04T13:40:23Z

nice and straight-forward. Win seems marginal but we take those.

ricardoV94 · 2026-05-04T13:50:44Z

gotta wait, we had a big regression in scan... without #2098

ricardoV94 force-pushed the numba_compile_faster branch 2 times, most recently from ad5bc53 to 21c8e3b Compare May 4, 2026 04:54

numba: inline="always" to speedup trivial op compilation

60c0ae3

ricardoV94 force-pushed the numba_compile_faster branch from 21c8e3b to 60c0ae3 Compare May 4, 2026 06:04

ricardoV94 marked this pull request as ready for review May 4, 2026 06:46

ricardoV94 added maintenance numba performance labels May 4, 2026

ricardoV94 requested a review from jessegrabowski May 4, 2026 06:47

jessegrabowski approved these changes May 4, 2026

View reviewed changes

ricardoV94 marked this pull request as draft May 4, 2026 13:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

numba: inline="always" to speedup trivial op compilation#2111

numba: inline="always" to speedup trivial op compilation#2111
ricardoV94 wants to merge 1 commit into
pymc-devs:mainfrom
ricardoV94:numba_compile_faster

ricardoV94 commented May 3, 2026 •

edited

Loading

Uh oh!

jessegrabowski commented May 4, 2026 •

edited

Loading

Uh oh!

ricardoV94 commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ricardoV94 commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jessegrabowski commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ricardoV94 commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ricardoV94 commented May 3, 2026 •

edited

Loading

jessegrabowski commented May 4, 2026 •

edited

Loading