A NumPy-compatible Python library backed by Rust for performance-critical operations.
Call it exactly like NumPy — no code changes needed on your side. The Rust engine runs underneath.
import rustnum
import numpy as np
x64 = np.random.randn(1000, 1000) # float64
x32 = x64.astype(np.float32) # float32
rustnum.sigmoid(x64) # float64 → float64, 1.9× faster than NumPy
rustnum.sigmoid(x32) # float32 → float32, output dtype always matches input
rustnum.softmax(x64) # 3.3× faster, parallelised across CPU cores
rustnum.relu(x32) # same as np.maximum(x, 0)NumPy is general-purpose. It doesn't include activation functions (relu, sigmoid,
softmax) because those belong to higher-level libraries. In practice, every DL
project implements them by hand, usually as a chain of NumPy operations — each one
allocating a temporary array, none of them as fast as a single fused Rust pass.
rustnum fills that gap: a thin Python API over a compiled Rust core.
Wheels are not yet on PyPI. Build from source for now.
Prerequisites: Rust toolchain (rustup), Python ≥ 3.8, maturin
git clone https://github.com/asjad2401/rustnum
cd rustnum
pip install maturin
maturin develop --release # builds + installs into your active venvAll functions accept float32 or float64 arrays. Output dtype always matches input — no silent upcasting.
Element-wise Rectified Linear Unit.
rustnum.relu(x) # equivalent to np.maximum(x, 0)Element-wise logistic sigmoid: 1 / (1 + exp(-x))
rustnum.sigmoid(x) # no single NumPy equivalent — users usually write the formulaNumerically stable softmax over any axis.
rustnum.softmax(x) # over last axis (default)
rustnum.softmax(x, axis=0) # over first axisSubtracts the per-slice max before exp to prevent overflow — the same approach
used by PyTorch and JAX internally.
200 runs, AMD x86-64, Python 3.12.
1000×1000, 200 runs, AMD x86-64, Python 3.12
| Function | dtype | NumPy | rustnum | Speedup |
|---|---|---|---|---|
relu |
f64 | 1.41 ms | 1.35 ms | ~1× |
relu |
f32 | 0.50 ms | 0.57 ms | ~1× |
sigmoid |
f64 | 11.66 ms | 6.08 ms | 1.9× |
sigmoid |
f32 | 3.96 ms | 3.44 ms | 1.15× |
softmax |
f64 | 11.94 ms | 3.63 ms | 3.3× |
softmax |
f32 | 4.00 ms | 1.83 ms | 2.2× |
relu matches NumPy because np.maximum is already SIMD-vectorised.
sigmoid wins because NumPy chains multiple ufunc calls; Rust fuses them in one pass.
softmax is parallelised across CPU cores via rayon — each row processed independently.
f32 is faster in absolute terms (half the memory bandwidth); relative speedups are smaller because NumPy's f32 path is also faster.
| Layer | Tool |
|---|---|
| Python bindings | PyO3 |
| NumPy array bridge | rust-numpy |
| N-dimensional arrays | ndarray |
| Parallelism | rayon |
| Build & packaging | maturin |
- Parallel softmax with
rayon(3× speedup) -
f32support — both dtypes accepted, output matches input - More activations:
leaky_relu,elu,gelu - PyPI release
maturin develop --release
pytest38 tests covering correctness, dtype preservation, and error handling for all functions.
This is an early-stage open-source project. Issues and PRs welcome.
See DEVLOG.md for a running log of design decisions and benchmarks.
MIT