Skip to content

[ext/standard] Specialize array_sum()/array_product() for long arrays#21787

Open
mehmetcansahin wants to merge 1 commit intophp:masterfrom
mehmetcansahin:array-sum-product-fast-path
Open

[ext/standard] Specialize array_sum()/array_product() for long arrays#21787
mehmetcansahin wants to merge 1 commit intophp:masterfrom
mehmetcansahin:array-sum-product-fast-path

Conversation

@mehmetcansahin
Copy link
Copy Markdown

@mehmetcansahin mehmetcansahin commented Apr 17, 2026

Summary

array_sum() and array_product() currently dispatch through
add_function() / mul_function() for every element.

This change adds a specialized fast path for arrays when both the
accumulator and current element are IS_LONG, inlining overflow-aware
arithmetic via fast_long_add_function() and ZEND_SIGNED_MULTIPLY_LONG() --
the same engine helpers that add_function_fast() dispatches to internally.
The fast path applies to both packed and hash arrays.

When those conditions are not met (overflow, non-IS_LONG entry, object,
resource, string, etc.), execution falls back to the existing generic path
through php_array_binop_apply(), preserving current behavior and warnings.

This follows the pattern of recent array function optimizations
(#18157, #18158, #18180).

Benchmark

Apple M1, release build (-O2 -DNDEBUG), median of 7 runs. Each benchmark
processes roughly 20M elements total (n * iterations held constant), so the
timings are intended to reflect per-element cost. Benchmark script posted as
a comment below.

Case Baseline Patched Speedup
array_sum, packed long (n=10000) 42.24 ms 12.99 ms 3.25x
array_product, packed small long (0-9 cycled, n=10000) 144.63 ms 19.39 ms 7.46x
array_product(range(1, 100)) (packed) 82.94 ms 58.66 ms 1.41x
array_sum, hash long (n=10000) 41.95 ms 12.90 ms 3.25x
array_product, hash small long (0-9 cycled, n=10000) 69.90 ms 19.40 ms 3.60x

Packed/hash float arrays and mixed IS_LONG / IS_DOUBLE arrays stay
approximately neutral.

Broader fast paths (including IS_DOUBLE accumulators) were prototyped
but dropped: add_function_fast() already inlines the double+double case,
so the wins were marginal, and the extra type-dispatch branch caused
regressions on mixed and pure-float workloads. The IS_LONG + IS_LONG
variant was the only one with a consistent win and no regressions.

Testing

  • Full ext/standard/tests/array/ suite: 856/856 passing on debug, ZTS, and release builds.
  • Added regression tests:
    • array_sum_packed_long_overflow.phpt
    • array_product_packed_long_overflow.phpt
    • array_sum_product_integration.phpt

All three new tests also pass on current master without the patch, so they
lock in existing semantics rather than implementation details.

@TimWolla
Copy link
Copy Markdown
Member

Please provide your Benchmarking script.

and was suggested by @TimWolla during review of
#18180.

This is not quite accurate, because array_(sum|product)() is not a callback-based function.

@Girgias
Copy link
Copy Markdown
Member

Girgias commented Apr 17, 2026

I submitted an RFC to use the same engine operation as the previous implementation was diverging from standard PHP binary op semantics.

So I'm quite wary of trying to implement some sort of by-pass.

The other thing that I don't understand is why would the array need to be packed to benefit from a hypothetical optimization? Nor do I see why integers should get special treatment but not floats (especially as that implementation is probably simpler).

The per-element cost of array_sum() and array_product() is dominated by
the add_function/mul_function call dispatch. Add a specialized fast
path for the IS_LONG + IS_LONG case that inlines overflow-aware
arithmetic via fast_long_add_function() and ZEND_SIGNED_MULTIPLY_LONG()
-- the same engine helpers that add_function_fast() dispatches to
internally. The fast path applies to both packed and hash arrays.

On overflow, non-IS_LONG entries, objects, resources or strings,
execution falls through to the generic path in php_array_binop_apply(),
so the PHP 8.3 warning behavior, operator overloading and BC casts for
resources/non-numeric strings are preserved.

Benchmarks (Apple M1, -O2 -DNDEBUG release, median of 7 runs, n=10000
with ~20M total elements per case):

  array_sum, packed long                     42.24 ms ->  12.99 ms  3.25x
  array_product, packed small long (0..9)   144.63 ms ->  19.39 ms  7.46x
  array_product, packed range(1, 100)        82.94 ms ->  58.66 ms  1.41x
  array_sum, hash long                       41.95 ms ->  12.90 ms  3.25x
  array_product, hash small long (0..9)      69.90 ms ->  19.40 ms  3.60x
  packed/hash float, mixed IS_LONG/IS_DOUBLE ~            ~         ~1.00x

Tests cover the overflow transition from fast path to generic, and
the integration with array_column() together with the PHP 8.3
nested-array warning behavior.
@mehmetcansahin mehmetcansahin force-pushed the array-sum-product-fast-path branch from fce40a5 to c371d94 Compare April 17, 2026 20:20
@mehmetcansahin mehmetcansahin changed the title [ext/standard] Specialize array_sum()/array_product() for packed long arrays [ext/standard] Specialize array_sum()/array_product() for long arrays Apr 17, 2026
@mehmetcansahin
Copy link
Copy Markdown
Author

mehmetcansahin commented Apr 17, 2026

I used the benchmark script below; I'll also keep it inline here so the numbers are reproducible:

<?php
const TOTAL_ELEMENTS = 20_000_000;
const RUNS = 7;

function median(array $v): float { sort($v); return $v[intdiv(count($v), 2)]; }
function hashify(array $v): array { $out = []; foreach ($v as $i => $x) $out["k$i"] = $x; return $out; }

function bench(string $name, callable $fn, array $data): void {
    $iters = max(1, intdiv(TOTAL_ELEMENTS, count($data)));
    for ($w = 0; $w < 2; $w++) for ($i = 0; $i < $iters; $i++) $last = $fn($data);

    $samples = [];
    for ($r = 0; $r < RUNS; $r++) {
        $t0 = hrtime(true);
        for ($i = 0; $i < $iters; $i++) $last = $fn($data);
        $samples[] = (hrtime(true) - $t0) / 1_000_000;
    }

    printf("%-28s n=%5d iters=%7d median=%8.2f ms result=%s\n",
        $name, count($data), $iters, median($samples), get_debug_type($last));
}

bench('sum packed long',              'array_sum',     range(1, 10000));
bench('product packed 0..9',          'array_product', array_map(fn($i) => $i % 10, range(0, 9999)));
bench('product packed range(1,100)',  'array_product', range(1, 100));
bench('sum packed float',             'array_sum',     array_map(fn($i) => $i + 0.5, range(1, 10000)));
bench('sum packed mixed',             'array_sum',     array_map(fn($i) => $i % 2 ? $i : $i + 0.5, range(1, 10000)));
bench('sum hash long',                'array_sum',     hashify(range(1, 10000)));
bench('product hash 0..9',            'array_product', hashify(array_map(fn($i) => $i % 10, range(0, 9999))));

Run with:

sapi/cli/php -n -d opcache.enable_cli=0 /tmp/array_sum_product_bench.php

And you're right about the wording: what I meant was that this follows the same packed/hash specialization pattern as the recent array optimizations, not that array_sum() / array_product() are callback-based. I've removed the misleading attribution from the PR description.

@mehmetcansahin
Copy link
Copy Markdown
Author

mehmetcansahin commented Apr 17, 2026

I share the concern about bypassing generic semantics, so the fast path is intentionally restricted to the exact IS_LONG / IS_LONG case and uses the same engine helpers (fast_long_add_function() / ZEND_SIGNED_MULTIPLY_LONG()) that add_function_fast() dispatches to internally. On overflow or any non-IS_LONG case it falls back immediately to the generic path, so the current warnings and BC behavior still come from php_array_binop_apply().

On the packed-array point: you're right. I tested extending the same IS_LONG fast path to the hash path as well, and it shows the same kind of win for integer-only hash arrays, so I've updated the patch accordingly instead of keeping it packed-only. The description and the UPGRADING entry no longer say "packed" and the benchmark table now includes hash cases.

On floats: I prototyped extending the fast path to IS_DOUBLE + IS_DOUBLE (and the cross-type pairs), but the wins were marginal — add_function_fast() already inlines the double+double case — and the extra type-dispatch branch caused regressions on mixed IS_LONG / IS_DOUBLE workloads. Keeping the fast path scoped to IS_LONG + IS_LONG was the only variant with a consistent win and no regressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants