Skip to content

numba: inline="always" to speedup trivial op compilation#2111

Draft
ricardoV94 wants to merge 1 commit into
pymc-devs:mainfrom
ricardoV94:numba_compile_faster
Draft

numba: inline="always" to speedup trivial op compilation#2111
ricardoV94 wants to merge 1 commit into
pymc-devs:mainfrom
ricardoV94:numba_compile_faster

Conversation

@ricardoV94
Copy link
Copy Markdown
Member

@ricardoV94 ricardoV94 commented May 3, 2026

This speedups first time compilation. Numba prefers to compile a single function with may operations instead of our - one function per op - design. Added inline=always to trivial cheap Ops (scalar, dimshuffle, reshape, ...)

Note the speedup in compile with cache=False (False flag). With cache=True we are seeing the dominating effect of compiling something we already saw before, that isn't changed. But the first time compile should now be faster as well.

Before
------------------------------------------------------------------------------------------------------------------------------- benchmark: 13 tests -------------------------------------------------------------------------------------------------------------------------------
Name (time in us)                                                          Min                        Max                       Mean                    StdDev                     Median                       IQR            Outliers           OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_radon_model_call_benchmark[NUMBA-False]                            7.1030 (1.0)             508.6630 (14.56)             7.4571 (1.0)              2.1671 (inf)               7.3640 (1.0)              0.1310 (inf)      573;3968  134,100.2381 (1.0)       57003           1
test_radon_model_call_benchmark[NUMBA-True]                             7.2030 (1.01)             34.9450 (1.0)               7.5947 (1.02)             0.9135 (inf)               7.4540 (1.01)             0.0900 (inf)     1984;2964  131,670.2561 (0.98)      55605           1
test_radon_model_compile_repeatedly_benchmark[NUMBA-True]         254,713.1150 (>1000.0)   5,534,446.4740 (>1000.0)   1,396,182.6770 (>1000.0)  2,320,576.2290 (inf)         257,882.6410 (>1000.0)  1,637,049.1070 (inf)           1;1        0.7162 (0.00)          5           1
test_radon_model_compile_repeatedly_benchmark[NUMBA-False]      5,671,687.5410 (>1000.0)   8,076,519.9060 (>1000.0)   6,710,889.8018 (>1000.0)  1,012,906.0587 (inf)       6,619,624.3880 (>1000.0)  1,725,231.5448 (inf)           2;0        0.1490 (0.00)          5           1
test_radon_model_compile_variants_benchmark[NUMBA-True]        20,028,947.0980 (>1000.0)  20,028,947.0980 (>1000.0)  20,028,947.0980 (>1000.0)          0.0000 (1.0)      20,028,947.0980 (>1000.0)          0.0000 (1.0)           0;0        0.0499 (0.00)          1           1
test_radon_model_compile_variants_benchmark[NUMBA-False]       50,546,677.2310 (>1000.0)  50,546,677.2310 (>1000.0)  50,546,677.2310 (>1000.0)          0.0000 (1.0)      50,546,677.2310 (>1000.0)          0.0000 (1.0)           0;0        0.0198 (0.00)          1           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

After
------------------------------------------------------------------------------------------------------------------------------- benchmark: 13 tests -------------------------------------------------------------------------------------------------------------------------------
Name (time in us)                                                          Min                        Max                       Mean                    StdDev                     Median                       IQR            Outliers           OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_radon_model_call_benchmark[NUMBA-False]                            6.6530 (1.0)              99.5560 (1.15)              7.3356 (1.0)              1.4372 (inf)               7.2330 (1.0)              0.1500 (inf)      862;3896  136,322.0726 (1.0)       49856           1
test_radon_model_call_benchmark[NUMBA-True]                             7.2030 (1.08)          1,034.4090 (11.91)             7.5908 (1.03)             4.4430 (inf)               7.4740 (1.03)             0.1000 (inf)      286;2677  131,738.6749 (0.97)      54813           1
test_radon_model_compile_repeatedly_benchmark[NUMBA-True]         258,900.6500 (>1000.0)   5,994,141.2680 (>1000.0)   1,408,531.9982 (>1000.0)  2,563,436.1926 (inf)         261,233.4620 (>1000.0)  1,440,107.1928 (inf)           1;1        0.7100 (0.00)          5           1
test_radon_model_compile_repeatedly_benchmark[NUMBA-False]      5,084,492.1370 (>1000.0)   7,516,582.0820 (>1000.0)   5,725,177.9268 (>1000.0)  1,010,872.8182 (inf)       5,326,797.6500 (>1000.0)    779,492.7115 (inf)           1;1        0.1747 (0.00)          5           1
test_radon_model_compile_variants_benchmark[NUMBA-True]        20,643,653.0320 (>1000.0)  20,643,653.0320 (>1000.0)  20,643,653.0320 (>1000.0)          0.0000 (1.0)      20,643,653.0320 (>1000.0)          0.0000 (1.0)           0;0        0.0484 (0.00)          1           1
test_radon_model_compile_variants_benchmark[NUMBA-False]       42,041,049.0130 (>1000.0)  42,041,049.0130 (>1000.0)  42,041,049.0130 (>1000.0)          0.0000 (1.0)      42,041,049.0130 (>1000.0)          0.0000 (1.0)           0;0        0.0238 (0.00)          1           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

@ricardoV94 ricardoV94 force-pushed the numba_compile_faster branch 2 times, most recently from ad5bc53 to 21c8e3b Compare May 4, 2026 04:54
@ricardoV94 ricardoV94 force-pushed the numba_compile_faster branch from 21c8e3b to 60c0ae3 Compare May 4, 2026 06:04
@ricardoV94 ricardoV94 marked this pull request as ready for review May 4, 2026 06:46
@ricardoV94 ricardoV94 requested a review from jessegrabowski May 4, 2026 06:47
@jessegrabowski
Copy link
Copy Markdown
Member

jessegrabowski commented May 4, 2026

nice and straight-forward. Win seems marginal but we take those.

@ricardoV94
Copy link
Copy Markdown
Member Author

gotta wait, we had a big regression in scan... without #2098

@ricardoV94 ricardoV94 marked this pull request as draft May 4, 2026 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants