[ext/standard] Specialize array_sum()/array_product() for long arrays#21787
[ext/standard] Specialize array_sum()/array_product() for long arrays#21787mehmetcansahin wants to merge 1 commit intophp:masterfrom
Conversation
|
I submitted an RFC to use the same engine operation as the previous implementation was diverging from standard PHP binary op semantics. So I'm quite wary of trying to implement some sort of by-pass. The other thing that I don't understand is why would the array need to be packed to benefit from a hypothetical optimization? Nor do I see why integers should get special treatment but not floats (especially as that implementation is probably simpler). |
The per-element cost of array_sum() and array_product() is dominated by the add_function/mul_function call dispatch. Add a specialized fast path for the IS_LONG + IS_LONG case that inlines overflow-aware arithmetic via fast_long_add_function() and ZEND_SIGNED_MULTIPLY_LONG() -- the same engine helpers that add_function_fast() dispatches to internally. The fast path applies to both packed and hash arrays. On overflow, non-IS_LONG entries, objects, resources or strings, execution falls through to the generic path in php_array_binop_apply(), so the PHP 8.3 warning behavior, operator overloading and BC casts for resources/non-numeric strings are preserved. Benchmarks (Apple M1, -O2 -DNDEBUG release, median of 7 runs, n=10000 with ~20M total elements per case): array_sum, packed long 42.24 ms -> 12.99 ms 3.25x array_product, packed small long (0..9) 144.63 ms -> 19.39 ms 7.46x array_product, packed range(1, 100) 82.94 ms -> 58.66 ms 1.41x array_sum, hash long 41.95 ms -> 12.90 ms 3.25x array_product, hash small long (0..9) 69.90 ms -> 19.40 ms 3.60x packed/hash float, mixed IS_LONG/IS_DOUBLE ~ ~ ~1.00x Tests cover the overflow transition from fast path to generic, and the integration with array_column() together with the PHP 8.3 nested-array warning behavior.
fce40a5 to
c371d94
Compare
|
I used the benchmark script below; I'll also keep it inline here so the numbers are reproducible: <?php
const TOTAL_ELEMENTS = 20_000_000;
const RUNS = 7;
function median(array $v): float { sort($v); return $v[intdiv(count($v), 2)]; }
function hashify(array $v): array { $out = []; foreach ($v as $i => $x) $out["k$i"] = $x; return $out; }
function bench(string $name, callable $fn, array $data): void {
$iters = max(1, intdiv(TOTAL_ELEMENTS, count($data)));
for ($w = 0; $w < 2; $w++) for ($i = 0; $i < $iters; $i++) $last = $fn($data);
$samples = [];
for ($r = 0; $r < RUNS; $r++) {
$t0 = hrtime(true);
for ($i = 0; $i < $iters; $i++) $last = $fn($data);
$samples[] = (hrtime(true) - $t0) / 1_000_000;
}
printf("%-28s n=%5d iters=%7d median=%8.2f ms result=%s\n",
$name, count($data), $iters, median($samples), get_debug_type($last));
}
bench('sum packed long', 'array_sum', range(1, 10000));
bench('product packed 0..9', 'array_product', array_map(fn($i) => $i % 10, range(0, 9999)));
bench('product packed range(1,100)', 'array_product', range(1, 100));
bench('sum packed float', 'array_sum', array_map(fn($i) => $i + 0.5, range(1, 10000)));
bench('sum packed mixed', 'array_sum', array_map(fn($i) => $i % 2 ? $i : $i + 0.5, range(1, 10000)));
bench('sum hash long', 'array_sum', hashify(range(1, 10000)));
bench('product hash 0..9', 'array_product', hashify(array_map(fn($i) => $i % 10, range(0, 9999))));Run with: And you're right about the wording: what I meant was that this follows the same packed/hash specialization pattern as the recent array optimizations, not that |
|
I share the concern about bypassing generic semantics, so the fast path is intentionally restricted to the exact On the packed-array point: you're right. I tested extending the same On floats: I prototyped extending the fast path to |
Summary
array_sum()andarray_product()currently dispatch throughadd_function()/mul_function()for every element.This change adds a specialized fast path for arrays when both the
accumulator and current element are
IS_LONG, inlining overflow-awarearithmetic via
fast_long_add_function()andZEND_SIGNED_MULTIPLY_LONG()--the same engine helpers that
add_function_fast()dispatches to internally.The fast path applies to both packed and hash arrays.
When those conditions are not met (overflow, non-
IS_LONGentry, object,resource, string, etc.), execution falls back to the existing generic path
through
php_array_binop_apply(), preserving current behavior and warnings.This follows the pattern of recent array function optimizations
(#18157, #18158, #18180).
Benchmark
Apple M1, release build (
-O2 -DNDEBUG), median of 7 runs. Each benchmarkprocesses roughly 20M elements total (
n * iterationsheld constant), so thetimings are intended to reflect per-element cost. Benchmark script posted as
a comment below.
array_sum, packed long (n=10000)array_product, packed small long (0-9 cycled, n=10000)array_product(range(1, 100))(packed)array_sum, hash long (n=10000)array_product, hash small long (0-9 cycled, n=10000)Packed/hash float arrays and mixed
IS_LONG/IS_DOUBLEarrays stayapproximately neutral.
Broader fast paths (including
IS_DOUBLEaccumulators) were prototypedbut dropped:
add_function_fast()already inlines thedouble+doublecase,so the wins were marginal, and the extra type-dispatch branch caused
regressions on mixed and pure-float workloads. The
IS_LONG+IS_LONGvariant was the only one with a consistent win and no regressions.
Testing
ext/standard/tests/array/suite:856/856passing on debug, ZTS, and release builds.array_sum_packed_long_overflow.phptarray_product_packed_long_overflow.phptarray_sum_product_integration.phptAll three new tests also pass on current master without the patch, so they
lock in existing semantics rather than implementation details.