crypto: Refactor mul_amm to separate output from inputs#1471
Conversation
Change mul_amm signature so that the output r is separate from the const inputs x and y (r must not alias either). This removes the internal t buffer parameter — mul_amm now writes directly to r. The caller uses a double-buffer technique: r_cur holds the current value, r_tmp receives the next mul_amm result. Squaring always writes to r_tmp; if a multiply follows, it writes back to r_cur directly, otherwise the spans are swapped. This eliminates the per-call copy from the old implementation. Also trim the result span passed to modexp_odd to match mod size exactly, and remove the trailing-zero fill (the caller pre-zeros the full result buffer).
There was a problem hiding this comment.
Pull request overview
This PR refactors the internal Montgomery multiplication helper used by the modexp precompile to separate outputs from inputs (no aliasing) and updates the odd-modulus exponentiation loop to use double-buffering, avoiding an extra per-call copy. It also tightens the result span passed into modexp_odd() to exactly match the trimmed odd modulus size.
Changes:
- Refactor
mul_amm()to take explicitxandyinputs and write directly into a non-aliasing output span. - Update
modexp_odd()to use a double-buffer technique for squaring/multiplying without the previous temporary/copy step. - Pass
result.first(odd_size)intomodexp_odd()when CRT is not needed (result buffer is pre-zeroed).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Additionally, passing y=1 converts x from Montgomery form back to plain form, | ||
| /// because for x = aR: mul_amm(x, 1) = aR⋅1⋅R⁻¹ % mod = a % mod. | ||
| /// | ||
| /// See "Efficient Software Implementations of Modular Exponentiation": | ||
| /// https://eprint.iacr.org/2011/239.pdf | ||
| void mul_amm(std::span<uint64_t> r, std::span<const uint64_t> y, std::span<const uint64_t> mod, | ||
| uint64_t mod_inv, std::span<uint64_t> t) noexcept | ||
| /// | ||
| /// Computes r = x * y * R^-1 mod m (Almost Montgomery Multiplication). | ||
| /// r must not alias x or y. | ||
| void mul_amm(std::span<uint64_t> r, std::span<const uint64_t> x, std::span<const uint64_t> y, | ||
| std::span<const uint64_t> mod, uint64_t mod_inv) noexcept |
There was a problem hiding this comment.
The function-level documentation above still describes Montgomery-to-plain conversion using the old in-place form (mul_amm(x, 1)), but the refactored API now requires the output r to be distinct from inputs. Please update the surrounding doc comment to reflect the new calling convention (conversion writes into a separate r).
| assert(mod.size() == n); | ||
| assert(t.size() == n); | ||
| assert(mod.back() != 0); | ||
| assert(r.data() != x.data() && r.data() != y.data()); // r must not alias inputs. |
There was a problem hiding this comment.
The non-aliasing assertion only compares data() pointers, which won’t catch overlapping spans with different start addresses (e.g., r = x.subspan(1)), even though the comment/contract says r must not alias inputs. Consider strengthening this debug check to detect any overlap between the [data, data+size) ranges of r vs x/y.
| assert(r.data() != x.data() && r.data() != y.data()); // r must not alias inputs. | |
| const auto ranges_overlap = [](auto a, auto b) noexcept { | |
| using T = std::remove_pointer_t<decltype(a.data())>; | |
| const T* a_begin = a.data(); | |
| const T* a_end = a_begin + a.size(); | |
| const T* b_begin = b.data(); | |
| const T* b_end = b_begin + b.size(); | |
| return a_begin < b_end && b_begin < a_end; | |
| }; | |
| assert(!ranges_overlap(r, x) && !ranges_overlap(r, y)); // r must not alias inputs. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1471 +/- ##
=======================================
Coverage 96.78% 96.78%
=======================================
Files 152 152
Lines 13878 13882 +4
Branches 3258 3259 +1
=======================================
+ Hits 13432 13436 +4
Misses 306 306
Partials 140 140
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Change mul_amm signature so that the output r is separate from the const inputs x and y (r must not alias either). This removes the internal t buffer parameter — mul_amm now writes directly to r.
The caller uses a double-buffer technique: r_cur holds the current value, r_tmp receives the next mul_amm result. Squaring always writes to r_tmp; if a multiply follows, it writes back to r_cur directly, otherwise the spans are swapped. This eliminates the per-call copy from the old implementation.
Also trim the result span passed to modexp_odd to match mod size exactly, and remove the trailing-zero fill (the caller pre-zeros the full result buffer).