Skip to content

perf(evm): add constant-shift fast paths#388

Draft
starwarfan wants to merge 1 commit intoDTVMStack:mainfrom
starwarfan:arith-256
Draft

perf(evm): add constant-shift fast paths#388
starwarfan wants to merge 1 commit intoDTVMStack:mainfrom
starwarfan:arith-256

Conversation

@starwarfan
Copy link
Copy Markdown
Contributor

1. Does this PR affect any open issues?(Y/N) and add issue references (e.g. "fix #123", "re #123".):

  • N
  • Y

2. What is the scope of this PR (e.g. component or file name):

3. Provide a description of the PR(e.g. more details, effects, motivations or doc link):

  • Affects user behaviors
  • Contains CI/CD configuration changes
  • Contains documentation changes
  • Contains experimental features
  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Other
    For constant shift amounts, use direct limb logic instead of O(4²) Select/cmp loops, this brings sha1_shifts -68%,blake2b_shifts -20%.

4. Are there any breaking changes?(Y/N) and describe the breaking changes(e.g. more details, motivations or doc link):

  • N
  • Y

5. Are there test cases for these changes?(Y/N) select and add more details, references or doc links:

  • Unit test
  • Integration test
  • Benchmark (add benchmark stats below)
  • Manual test (add detailed scripts or steps below)
  • Other

6. Release note

None

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 6, 2026

⚡ Performance Regression Check Results

✅ Performance Check Passed (interpreter)

Performance Benchmark Results (threshold: 25%)

Benchmark Baseline (us) Current (us) Change Status
total/main/blake2b_huff/8415nulls 1.51 1.54 +2.4% PASS
total/main/blake2b_huff/empty 0.02 0.02 +2.1% PASS
total/main/blake2b_shifts/8415nulls 11.72 11.44 -2.4% PASS
total/main/sha1_divs/5311 5.22 5.05 -3.3% PASS
total/main/sha1_divs/empty 0.07 0.06 -2.7% PASS
total/main/sha1_shifts/5311 2.95 2.91 -1.4% PASS
total/main/sha1_shifts/empty 0.04 0.04 -2.6% PASS
total/main/snailtracer/benchmark 53.69 54.26 +1.1% PASS
total/main/structarray_alloc/nfts_rank 1.02 1.07 +4.9% PASS
total/main/swap_math/insufficient_liquidity 0.00 0.00 -2.0% PASS
total/main/swap_math/received 0.01 0.01 -1.1% PASS
total/main/swap_math/spent 0.00 0.00 -1.4% PASS
total/main/weierstrudel/1 0.29 0.29 +0.1% PASS
total/main/weierstrudel/15 3.16 3.16 +0.3% PASS
total/micro/JUMPDEST_n0/empty 1.30 1.48 +13.4% PASS
total/micro/jump_around/empty 0.10 0.09 -2.9% PASS
total/micro/loop_with_many_jumpdests/empty 19.92 22.43 +12.6% PASS
total/micro/memory_grow_mload/by1 0.09 0.09 -2.1% PASS
total/micro/memory_grow_mload/by16 0.10 0.10 -3.6% PASS
total/micro/memory_grow_mload/by32 0.11 0.12 +2.3% PASS
total/micro/memory_grow_mload/nogrow 0.09 0.10 +7.2% PASS
total/micro/memory_grow_mstore/by1 0.10 0.10 +5.2% PASS
total/micro/memory_grow_mstore/by16 0.11 0.12 +8.1% PASS
total/micro/memory_grow_mstore/by32 0.12 0.13 +7.3% PASS
total/micro/memory_grow_mstore/nogrow 0.09 0.10 +2.7% PASS
total/micro/signextend/one 0.24 0.27 +13.4% PASS
total/micro/signextend/zero 0.24 0.27 +12.7% PASS
total/synth/ADD/b0 1.95 6.30 +223.6% PASS
total/synth/ADD/b1 1.97 1.99 +0.7% PASS
total/synth/ADDRESS/a0 4.75 4.73 -0.6% PASS
total/synth/ADDRESS/a1 5.26 5.16 -1.9% PASS
total/synth/AND/b0 1.63 1.65 +1.5% PASS
total/synth/AND/b1 1.71 1.68 -1.6% PASS
total/synth/BYTE/b0 6.21 6.12 -1.4% PASS
total/synth/BYTE/b1 4.84 4.77 -1.4% PASS
total/synth/CALLDATASIZE/a0 3.10 3.23 +4.2% PASS
total/synth/CALLDATASIZE/a1 3.52 3.38 -4.0% PASS
total/synth/CALLER/a0 4.74 4.72 -0.4% PASS
total/synth/CALLER/a1 5.25 5.17 -1.6% PASS
total/synth/CALLVALUE/a0 3.55 3.10 -12.6% PASS
total/synth/CALLVALUE/a1 3.60 3.45 -4.3% PASS
total/synth/CODESIZE/a0 3.59 3.70 +3.1% PASS
total/synth/CODESIZE/a1 3.94 3.85 -2.3% PASS
total/synth/DUP1/d0 1.23 1.20 -2.0% PASS
total/synth/DUP1/d1 1.31 1.23 -6.2% PASS
total/synth/DUP10/d0 1.23 0.91 -26.2% PASS
total/synth/DUP10/d1 1.14 1.24 +8.6% PASS
total/synth/DUP11/d0 1.23 0.91 -25.7% PASS
total/synth/DUP11/d1 1.31 0.99 -24.4% PASS
total/synth/DUP12/d0 1.21 0.91 -25.1% PASS
total/synth/DUP12/d1 1.14 1.23 +8.4% PASS
total/synth/DUP13/d0 1.23 0.91 -25.9% PASS
total/synth/DUP13/d1 1.08 1.23 +14.7% PASS
total/synth/DUP14/d0 1.23 0.98 -19.7% PASS
total/synth/DUP14/d1 1.29 1.23 -4.0% PASS
total/synth/DUP15/d0 1.23 0.99 -19.9% PASS
total/synth/DUP15/d1 1.22 1.22 -0.0% PASS
total/synth/DUP16/d0 1.23 1.15 -6.3% PASS
total/synth/DUP16/d1 1.31 1.23 -6.1% PASS
total/synth/DUP2/d0 1.23 0.91 -25.9% PASS
total/synth/DUP2/d1 1.07 1.23 +14.7% PASS
total/synth/DUP3/d0 1.23 1.14 -6.7% PASS
total/synth/DUP3/d1 1.31 1.23 -6.4% PASS
total/synth/DUP4/d0 1.08 1.12 +3.6% PASS
total/synth/DUP4/d1 1.08 1.23 +13.8% PASS
total/synth/DUP5/d0 1.23 1.14 -6.8% PASS
total/synth/DUP5/d1 1.29 1.23 -4.0% PASS
total/synth/DUP6/d0 1.19 1.15 -4.1% PASS
total/synth/DUP6/d1 1.08 1.23 +14.4% PASS
total/synth/DUP7/d0 1.23 0.99 -19.5% PASS
total/synth/DUP7/d1 1.07 1.23 +14.9% PASS
total/synth/DUP8/d0 1.23 0.99 -19.4% PASS
total/synth/DUP8/d1 1.07 1.23 +14.8% PASS
total/synth/DUP9/d0 1.23 1.15 -6.6% PASS
total/synth/DUP9/d1 1.23 1.23 +0.2% PASS
total/synth/EQ/b0 2.72 2.75 +1.1% PASS
total/synth/EQ/b1 1.39 1.39 +0.1% PASS
total/synth/GAS/a0 3.66 3.67 +0.1% PASS
total/synth/GAS/a1 3.71 3.70 -0.3% PASS
total/synth/GT/b0 2.59 2.61 +0.8% PASS
total/synth/GT/b1 1.47 1.39 -5.5% PASS
total/synth/ISZERO/u0 1.02 0.99 -3.2% PASS
total/synth/JUMPDEST/n0 1.30 1.48 +13.2% PASS
total/synth/LT/b0 2.60 2.67 +2.7% PASS
total/synth/LT/b1 1.47 1.38 -5.8% PASS
total/synth/MSIZE/a0 4.27 4.26 -0.3% PASS
total/synth/MSIZE/a1 4.76 4.67 -1.9% PASS
total/synth/MUL/b0 5.31 5.31 -0.0% PASS
total/synth/MUL/b1 5.30 5.31 +0.2% PASS
total/synth/NOT/u0 1.65 1.65 -0.0% PASS
total/synth/OR/b0 1.63 1.65 +1.3% PASS
total/synth/OR/b1 1.71 1.71 -0.0% PASS
total/synth/PC/a0 3.42 3.34 -2.4% PASS
total/synth/PC/a1 3.52 3.38 -4.1% PASS
total/synth/PUSH1/p0 1.15 0.80 -30.3% PASS
total/synth/PUSH1/p1 1.31 1.16 -11.1% PASS
total/synth/PUSH10/p0 0.91 0.98 +8.7% PASS
total/synth/PUSH10/p1 1.33 1.21 -8.8% PASS
total/synth/PUSH11/p0 1.15 0.98 -14.9% PASS
total/synth/PUSH11/p1 1.31 1.19 -9.0% PASS
total/synth/PUSH12/p0 0.91 0.95 +4.3% PASS
total/synth/PUSH12/p1 1.33 1.21 -8.9% PASS
total/synth/PUSH13/p0 1.15 0.99 -13.9% PASS
total/synth/PUSH13/p1 1.31 1.18 -10.0% PASS
total/synth/PUSH14/p0 1.16 0.96 -16.9% PASS
total/synth/PUSH14/p1 1.33 1.23 -8.0% PASS
total/synth/PUSH15/p0 1.15 0.98 -14.8% PASS
total/synth/PUSH15/p1 1.39 1.28 -8.0% PASS
total/synth/PUSH16/p0 1.15 0.99 -14.0% PASS
total/synth/PUSH16/p1 1.33 1.20 -9.6% PASS
total/synth/PUSH17/p0 1.14 0.80 -29.5% PASS
total/synth/PUSH17/p1 1.31 1.20 -8.6% PASS
total/synth/PUSH18/p0 1.13 0.97 -14.7% PASS
total/synth/PUSH18/p1 1.32 1.20 -8.9% PASS
total/synth/PUSH19/p0 0.91 0.87 -4.0% PASS
total/synth/PUSH19/p1 1.31 1.23 -6.4% PASS
total/synth/PUSH2/p0 1.15 0.96 -16.1% PASS
total/synth/PUSH2/p1 1.31 1.17 -10.6% PASS
total/synth/PUSH20/p0 1.14 0.99 -13.4% PASS
total/synth/PUSH20/p1 1.33 1.21 -9.3% PASS
total/synth/PUSH21/p0 1.14 0.98 -13.5% PASS
total/synth/PUSH21/p1 1.31 1.24 -5.7% PASS
total/synth/PUSH22/p0 1.15 0.99 -14.1% PASS
total/synth/PUSH22/p1 1.33 1.21 -9.6% PASS
total/synth/PUSH23/p0 1.15 0.88 -23.6% PASS
total/synth/PUSH23/p1 1.32 1.23 -6.6% PASS
total/synth/PUSH24/p0 0.91 0.98 +7.8% PASS
total/synth/PUSH24/p1 1.34 1.22 -8.4% PASS
total/synth/PUSH25/p0 1.15 0.99 -14.0% PASS
total/synth/PUSH25/p1 1.31 1.22 -7.3% PASS
total/synth/PUSH26/p0 1.15 0.83 -27.2% PASS
total/synth/PUSH26/p1 1.32 1.21 -8.5% PASS
total/synth/PUSH27/p0 1.15 0.93 -19.2% PASS
total/synth/PUSH27/p1 1.31 1.23 -6.5% PASS
total/synth/PUSH28/p0 1.14 0.99 -13.7% PASS
total/synth/PUSH28/p1 1.34 1.21 -9.4% PASS
total/synth/PUSH29/p0 1.15 0.97 -15.8% PASS
total/synth/PUSH29/p1 1.31 1.21 -7.7% PASS
total/synth/PUSH3/p0 1.15 0.98 -14.1% PASS
total/synth/PUSH3/p1 1.31 1.17 -10.6% PASS
total/synth/PUSH30/p0 0.93 1.01 +8.2% PASS
total/synth/PUSH30/p1 1.34 1.23 -8.3% PASS
total/synth/PUSH31/p0 1.15 0.98 -14.3% PASS
total/synth/PUSH31/p1 1.41 1.31 -7.4% PASS
total/synth/PUSH32/p0 1.14 0.89 -22.2% PASS
total/synth/PUSH32/p1 1.34 1.23 -8.3% PASS
total/synth/PUSH4/p0 0.91 0.98 +8.6% PASS
total/synth/PUSH4/p1 1.32 1.19 -9.3% PASS
total/synth/PUSH5/p0 0.91 0.98 +8.4% PASS
total/synth/PUSH5/p1 1.31 1.19 -9.2% PASS
total/synth/PUSH6/p0 1.14 0.99 -13.6% PASS
total/synth/PUSH6/p1 1.31 1.21 -7.5% PASS
total/synth/PUSH7/p0 0.91 0.98 +7.8% PASS
total/synth/PUSH7/p1 1.31 1.22 -6.7% PASS
total/synth/PUSH8/p0 1.15 0.98 -14.2% PASS
total/synth/PUSH8/p1 1.33 1.19 -10.1% PASS
total/synth/PUSH9/p0 1.15 0.98 -14.5% PASS
total/synth/PUSH9/p1 1.31 1.19 -9.1% PASS
total/synth/RETURNDATASIZE/a0 3.91 3.36 -14.0% PASS
total/synth/RETURNDATASIZE/a1 4.02 3.69 -8.0% PASS
total/synth/SAR/b0 3.88 3.86 -0.4% PASS
total/synth/SAR/b1 4.40 4.40 +0.1% PASS
total/synth/SGT/b0 2.61 2.60 -0.3% PASS
total/synth/SGT/b1 1.55 1.55 +0.0% PASS
total/synth/SHL/b0 3.02 3.04 +0.4% PASS
total/synth/SHL/b1 1.62 1.56 -4.0% PASS
total/synth/SHR/b0 3.10 3.12 +0.8% PASS
total/synth/SHR/b1 1.69 1.55 -8.0% PASS
total/synth/SIGNEXTEND/b0 3.50 3.31 -5.7% PASS
total/synth/SIGNEXTEND/b1 3.61 3.64 +0.7% PASS
total/synth/SLT/b0 2.63 2.61 -0.6% PASS
total/synth/SLT/b1 1.63 1.55 -4.8% PASS
total/synth/SUB/b0 1.95 1.94 -0.1% PASS
total/synth/SUB/b1 1.97 1.97 -0.0% PASS
total/synth/SWAP1/s0 1.50 1.48 -0.9% PASS
total/synth/SWAP10/s0 1.51 1.49 -1.0% PASS
total/synth/SWAP11/s0 1.51 1.49 -1.2% PASS
total/synth/SWAP12/s0 1.51 1.50 -0.9% PASS
total/synth/SWAP13/s0 1.51 1.50 -0.8% PASS
total/synth/SWAP14/s0 1.51 1.50 -0.9% PASS
total/synth/SWAP15/s0 1.51 1.50 -0.9% PASS
total/synth/SWAP16/s0 1.52 1.50 -0.9% PASS
total/synth/SWAP2/s0 1.50 1.49 -0.5% PASS
total/synth/SWAP3/s0 1.50 1.49 -0.8% PASS
total/synth/SWAP4/s0 1.50 1.49 -0.3% PASS
total/synth/SWAP5/s0 1.50 1.52 +1.5% PASS
total/synth/SWAP6/s0 1.50 1.49 -1.0% PASS
total/synth/SWAP7/s0 1.50 1.49 -0.9% PASS
total/synth/SWAP8/s0 1.51 1.49 -1.0% PASS
total/synth/SWAP9/s0 1.51 1.49 -0.9% PASS
total/synth/XOR/b0 1.55 1.55 +0.1% PASS
total/synth/XOR/b1 1.55 1.55 +0.1% PASS
total/synth/loop_v1 4.82 4.78 -0.8% PASS
total/synth/loop_v2 4.77 4.77 +0.1% PASS

Summary: 194 benchmarks, 0 regressions


✅ Performance Check Passed (multipass)

Performance Benchmark Results (threshold: 25%)

Benchmark Baseline (us) Current (us) Change Status
total/main/blake2b_huff/8415nulls 1.52 1.54 +1.6% PASS
total/main/blake2b_huff/empty 0.06 0.07 +11.5% PASS
total/main/blake2b_shifts/8415nulls 6.51 4.09 -37.2% PASS
total/main/sha1_divs/5311 3.46 3.43 -0.8% PASS
total/main/sha1_divs/empty 0.04 0.05 +1.0% PASS
total/main/sha1_shifts/5311 3.73 0.61 -83.6% PASS
total/main/sha1_shifts/empty 0.05 0.01 -71.6% PASS
total/main/snailtracer/benchmark 54.71 54.79 +0.4% PASS
total/main/structarray_alloc/nfts_rank 0.31 0.31 +0.2% PASS
total/main/swap_math/insufficient_liquidity 0.02 0.02 -0.2% PASS
total/main/swap_math/received 0.02 0.02 -4.4% PASS
total/main/swap_math/spent 0.02 0.02 -4.2% PASS
total/main/weierstrudel/1 0.39 0.36 -7.7% PASS
total/main/weierstrudel/15 3.21 3.22 +0.3% PASS
total/micro/JUMPDEST_n0/empty 0.14 0.14 +0.1% PASS
total/micro/jump_around/empty 0.63 0.62 -1.3% PASS
total/micro/loop_with_many_jumpdests/empty 1.98 1.98 -0.3% PASS
total/micro/memory_grow_mload/by1 0.17 0.18 +1.1% PASS
total/micro/memory_grow_mload/by16 0.19 0.19 +0.8% PASS
total/micro/memory_grow_mload/by32 0.20 0.21 +1.4% PASS
total/micro/memory_grow_mload/nogrow 0.18 0.18 -0.9% PASS
total/micro/memory_grow_mstore/by1 0.18 0.19 +3.4% PASS
total/micro/memory_grow_mstore/by16 0.19 0.20 +3.9% PASS
total/micro/memory_grow_mstore/by32 0.21 0.21 +3.4% PASS
total/micro/memory_grow_mstore/nogrow 0.18 0.18 +4.5% PASS
total/micro/signextend/one 0.38 0.35 -9.0% PASS
total/micro/signextend/zero 0.38 0.35 -9.2% PASS
total/synth/ADD/b0 0.01 0.01 +9.2% PASS
total/synth/ADD/b1 0.01 0.01 +5.8% PASS
total/synth/ADDRESS/a0 0.95 0.98 +3.8% PASS
total/synth/ADDRESS/a1 0.93 1.01 +8.4% PASS
total/synth/AND/b0 0.01 0.01 +9.3% PASS
total/synth/AND/b1 0.01 0.01 +6.2% PASS
total/synth/BYTE/b0 1.97 1.99 +0.8% PASS
total/synth/BYTE/b1 2.32 2.32 +0.0% PASS
total/synth/CALLDATASIZE/a0 0.57 0.49 -14.2% PASS
total/synth/CALLDATASIZE/a1 0.60 0.52 -13.6% PASS
total/synth/CALLER/a0 0.91 0.98 +7.3% PASS
total/synth/CALLER/a1 0.98 1.01 +2.8% PASS
total/synth/CALLVALUE/a0 0.49 0.49 +0.2% PASS
total/synth/CALLVALUE/a1 0.52 0.52 +0.6% PASS
total/synth/CODESIZE/a0 0.57 0.49 -14.2% PASS
total/synth/CODESIZE/a1 0.60 0.52 -13.6% PASS
total/synth/DUP1/d0 0.01 0.01 +7.8% PASS
total/synth/DUP1/d1 0.01 0.01 +5.9% PASS
total/synth/DUP10/d0 0.01 0.01 +7.9% PASS
total/synth/DUP10/d1 0.01 0.01 +6.2% PASS
total/synth/DUP11/d0 0.01 0.01 +8.0% PASS
total/synth/DUP11/d1 0.01 0.01 +5.9% PASS
total/synth/DUP12/d0 0.01 0.01 +7.9% PASS
total/synth/DUP12/d1 0.01 0.01 +5.8% PASS
total/synth/DUP13/d0 0.01 0.01 +7.9% PASS
total/synth/DUP13/d1 0.01 0.01 +6.0% PASS
total/synth/DUP14/d0 0.01 0.01 +7.9% PASS
total/synth/DUP14/d1 0.01 0.01 +5.6% PASS
total/synth/DUP15/d0 0.01 0.01 +8.0% PASS
total/synth/DUP15/d1 0.01 0.01 +5.8% PASS
total/synth/DUP16/d0 0.01 0.01 +7.7% PASS
total/synth/DUP16/d1 0.01 0.01 +5.9% PASS
total/synth/DUP2/d0 0.01 0.01 +7.8% PASS
total/synth/DUP2/d1 0.01 0.01 +5.8% PASS
total/synth/DUP3/d0 0.01 0.01 +7.5% PASS
total/synth/DUP3/d1 0.01 0.01 +6.0% PASS
total/synth/DUP4/d0 0.01 0.01 +7.8% PASS
total/synth/DUP4/d1 0.01 0.01 +5.8% PASS
total/synth/DUP5/d0 0.01 0.01 +7.8% PASS
total/synth/DUP5/d1 0.01 0.01 +5.9% PASS
total/synth/DUP6/d0 0.01 0.01 +8.1% PASS
total/synth/DUP6/d1 0.01 0.01 +5.8% PASS
total/synth/DUP7/d0 0.01 0.01 +7.8% PASS
total/synth/DUP7/d1 0.01 0.01 +5.9% PASS
total/synth/DUP8/d0 0.01 0.01 +7.9% PASS
total/synth/DUP8/d1 0.01 0.01 +5.9% PASS
total/synth/DUP9/d0 0.01 0.01 +7.8% PASS
total/synth/DUP9/d1 0.01 0.01 +5.8% PASS
total/synth/EQ/b0 0.01 0.01 +9.2% PASS
total/synth/EQ/b1 0.01 0.01 +5.8% PASS
total/synth/GAS/a0 0.91 0.91 -0.0% PASS
total/synth/GAS/a1 0.95 0.95 +0.0% PASS
total/synth/GT/b0 0.01 0.01 +9.2% PASS
total/synth/GT/b1 0.01 0.01 +5.8% PASS
total/synth/ISZERO/u0 0.01 0.01 +11.5% PASS
total/synth/JUMPDEST/n0 0.14 0.14 -0.6% PASS
total/synth/LT/b0 0.01 0.01 +9.1% PASS
total/synth/LT/b1 0.01 0.01 +5.6% PASS
total/synth/MSIZE/a0 0.01 0.01 +11.6% PASS
total/synth/MSIZE/a1 0.01 0.01 +8.6% PASS
total/synth/MUL/b0 5.31 5.30 -0.1% PASS
total/synth/MUL/b1 5.30 5.30 -0.0% PASS
total/synth/NOT/u0 0.01 0.01 +11.4% PASS
total/synth/OR/b0 0.01 0.01 +9.2% PASS
total/synth/OR/b1 0.01 0.01 +5.8% PASS
total/synth/PC/a0 0.01 0.01 +11.8% PASS
total/synth/PC/a1 0.01 0.01 +8.6% PASS
total/synth/PUSH1/p0 0.01 0.01 +8.8% PASS
total/synth/PUSH1/p1 0.01 0.01 +9.0% PASS
total/synth/PUSH10/p0 0.01 0.01 +8.8% PASS
total/synth/PUSH10/p1 0.01 0.01 +8.9% PASS
total/synth/PUSH11/p0 0.01 0.01 +8.8% PASS
total/synth/PUSH11/p1 0.01 0.01 +8.8% PASS
total/synth/PUSH12/p0 0.01 0.01 +8.8% PASS
total/synth/PUSH12/p1 0.01 0.01 +8.9% PASS
total/synth/PUSH13/p0 0.01 0.01 +8.8% PASS
total/synth/PUSH13/p1 0.01 0.01 +7.4% PASS
total/synth/PUSH14/p0 0.01 0.01 +8.9% PASS
total/synth/PUSH14/p1 0.01 0.01 +8.4% PASS
total/synth/PUSH15/p0 0.01 0.01 +8.8% PASS
total/synth/PUSH15/p1 0.01 0.01 +8.9% PASS
total/synth/PUSH16/p0 0.01 0.01 +8.8% PASS
total/synth/PUSH16/p1 0.01 0.01 +8.8% PASS
total/synth/PUSH17/p0 0.01 0.01 +8.8% PASS
total/synth/PUSH17/p1 0.01 0.01 +9.1% PASS
total/synth/PUSH18/p0 0.01 0.01 +8.9% PASS
total/synth/PUSH18/p1 0.01 0.01 +8.8% PASS
total/synth/PUSH19/p0 0.01 0.01 +8.8% PASS
total/synth/PUSH19/p1 0.01 0.01 +8.7% PASS
total/synth/PUSH2/p0 0.01 0.01 +8.7% PASS
total/synth/PUSH2/p1 0.01 0.01 +8.7% PASS
total/synth/PUSH20/p0 0.01 0.01 +9.2% PASS
total/synth/PUSH20/p1 0.01 0.01 +8.9% PASS
total/synth/PUSH21/p0 0.01 0.01 +9.4% PASS
total/synth/PUSH21/p1 0.01 0.01 +8.9% PASS
total/synth/PUSH22/p0 1.16 1.00 -14.0% PASS
total/synth/PUSH22/p1 1.32 1.22 -7.4% PASS
total/synth/PUSH23/p0 1.16 1.00 -13.8% PASS
total/synth/PUSH23/p1 1.33 1.23 -7.6% PASS
total/synth/PUSH24/p0 1.16 1.00 -13.9% PASS
total/synth/PUSH24/p1 1.35 1.24 -8.2% PASS
total/synth/PUSH25/p0 1.16 1.00 -13.9% PASS
total/synth/PUSH25/p1 1.36 1.22 -10.1% PASS
total/synth/PUSH26/p0 0.92 0.83 -10.2% PASS
total/synth/PUSH26/p1 1.35 1.25 -7.4% PASS
total/synth/PUSH27/p0 1.16 1.00 -13.7% PASS
total/synth/PUSH27/p1 1.35 1.22 -10.0% PASS
total/synth/PUSH28/p0 1.16 1.00 -14.0% PASS
total/synth/PUSH28/p1 1.35 1.23 -8.9% PASS
total/synth/PUSH29/p0 1.16 1.00 -13.8% PASS
total/synth/PUSH29/p1 1.35 1.22 -9.3% PASS
total/synth/PUSH3/p0 0.01 0.01 +8.9% PASS
total/synth/PUSH3/p1 0.01 0.01 +8.9% PASS
total/synth/PUSH30/p0 1.20 1.01 -15.8% PASS
total/synth/PUSH30/p1 1.35 1.25 -7.5% PASS
total/synth/PUSH31/p0 1.17 1.00 -14.3% PASS
total/synth/PUSH31/p1 1.55 1.35 -13.1% PASS
total/synth/PUSH32/p0 1.16 1.00 -13.9% PASS
total/synth/PUSH32/p1 1.36 1.26 -7.2% PASS
total/synth/PUSH4/p0 0.01 0.01 +8.8% PASS
total/synth/PUSH4/p1 0.01 0.01 +8.8% PASS
total/synth/PUSH5/p0 0.01 0.01 +8.7% PASS
total/synth/PUSH5/p1 0.01 0.01 +8.9% PASS
total/synth/PUSH6/p0 0.01 0.01 +8.8% PASS
total/synth/PUSH6/p1 0.01 0.01 +8.9% PASS
total/synth/PUSH7/p0 0.01 0.01 +8.7% PASS
total/synth/PUSH7/p1 0.01 0.01 +8.9% PASS
total/synth/PUSH8/p0 0.01 0.01 +8.9% PASS
total/synth/PUSH8/p1 0.01 0.01 +8.8% PASS
total/synth/PUSH9/p0 0.01 0.01 +8.8% PASS
total/synth/PUSH9/p1 0.01 0.01 +8.2% PASS
total/synth/RETURNDATASIZE/a0 0.49 0.49 +0.1% PASS
total/synth/RETURNDATASIZE/a1 0.52 0.52 +0.1% PASS
total/synth/SAR/b0 3.87 3.88 +0.2% PASS
total/synth/SAR/b1 4.37 4.47 +2.2% PASS
total/synth/SGT/b0 0.01 0.01 +9.7% PASS
total/synth/SGT/b1 0.01 0.01 +5.8% PASS
total/synth/SHL/b0 3.04 3.04 -0.0% PASS
total/synth/SHL/b1 1.64 1.57 -4.2% PASS
total/synth/SHR/b0 3.11 3.11 -0.1% PASS
total/synth/SHR/b1 1.72 1.56 -9.2% PASS
total/synth/SIGNEXTEND/b0 3.29 3.31 +0.5% PASS
total/synth/SIGNEXTEND/b1 3.61 3.68 +2.1% PASS
total/synth/SLT/b0 0.01 0.01 +9.2% PASS
total/synth/SLT/b1 0.01 0.01 +5.8% PASS
total/synth/SUB/b0 0.01 0.01 +9.0% PASS
total/synth/SUB/b1 0.01 0.01 +5.9% PASS
total/synth/SWAP1/s0 0.01 0.01 +6.1% PASS
total/synth/SWAP10/s0 0.01 0.01 +5.8% PASS
total/synth/SWAP11/s0 0.01 0.01 +6.0% PASS
total/synth/SWAP12/s0 0.01 0.01 +5.7% PASS
total/synth/SWAP13/s0 0.01 0.01 +5.3% PASS
total/synth/SWAP14/s0 0.01 0.01 +5.4% PASS
total/synth/SWAP15/s0 0.01 0.01 +5.4% PASS
total/synth/SWAP16/s0 0.01 0.01 +5.4% PASS
total/synth/SWAP2/s0 0.01 0.01 +5.5% PASS
total/synth/SWAP3/s0 0.01 0.01 +5.2% PASS
total/synth/SWAP4/s0 0.01 0.01 +5.2% PASS
total/synth/SWAP5/s0 0.01 0.01 +5.4% PASS
total/synth/SWAP6/s0 0.01 0.01 +5.2% PASS
total/synth/SWAP7/s0 0.01 0.01 +4.9% PASS
total/synth/SWAP8/s0 0.01 0.01 +5.2% PASS
total/synth/SWAP9/s0 0.01 0.01 +5.3% PASS
total/synth/XOR/b0 0.01 0.01 +9.3% PASS
total/synth/XOR/b1 0.01 0.01 +5.6% PASS
total/synth/loop_v1 1.42 1.40 -1.6% PASS
total/synth/loop_v2 1.34 1.32 -1.0% PASS

Summary: 194 benchmarks, 0 regressions


Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds constant-shift fast paths in the EVM MIR compiler to avoid the current O(4²) select/cmp-based limb selection when the shift amount is a compile-time constant, targeting improved performance in shift-heavy workloads (e.g., SHA1/BLAKE2b).

Changes:

  • Add a helper to detect constant shift amounts from MIR instructions.
  • Implement constant-shift fast paths for SHL, logical SHR, and arithmetic SAR using direct limb/carry logic.
  • Add required includes for MIR constants, LLVM casting, and std::optional.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +2073 to +2084
// Fast path: constant shift amount — direct limb logic, no Select/cmp loops.
if (auto ShiftOpt = getConstShiftAmount(ShiftAmount)) {
uint64_t Shift = *ShiftOpt;
if (Shift >= 256) {
for (size_t I = 0; I < EVM_ELEMENTS_COUNT; ++I)
Result[I] = Zero;
return Result;
}
uint64_t CompShift = Shift / 64;
uint64_t ShiftMod = Shift % 64;
uint64_t CarryShift = (ShiftMod == 0) ? 0 : (64 - ShiftMod);
for (size_t I = 0; I < EVM_ELEMENTS_COUNT; ++I) {
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as SHL fast path: the constant-shift path uses only ShiftAmount (low 64 bits) and ignores IsLargeShift, so shifts with any non-zero high limb but small low limb will incorrectly behave like a small shift instead of producing 0 per EVM spec. The fast path should still incorporate IsLargeShift (e.g., per-limb select to 0) to preserve correctness for large 256-bit shift values.

Copilot uses AI. Check for mistakes.
Comment on lines +2248 to +2259
// Fast path: constant shift amount — direct limb logic, no Select/cmp loops.
if (auto ShiftOpt = getConstShiftAmount(ShiftAmount)) {
uint64_t Shift = *ShiftOpt;
if (Shift >= 256) {
for (size_t I = 0; I < EVM_ELEMENTS_COUNT; ++I)
Result[I] = LargeShiftResult;
return Result;
}
uint64_t CompShift = Shift / 64;
uint64_t ShiftMod = Shift % 64;
uint64_t CarryShift = (ShiftMod == 0) ? 0 : (64 - ShiftMod);
for (size_t I = 0; I < EVM_ELEMENTS_COUNT; ++I) {
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SAR constant-shift fast path ignores IsLargeShift and only checks Shift >= 256 based on the low 64 bits. This is incorrect for 256-bit shift values with high limbs set but Shift[0] < 256 (EVM requires full sign-extension result when shift >= 256). Make the fast path depend on IsLargeShift (e.g., select LargeShiftResult vs computed limb result) or lift constant-shift evaluation to handleShift() where the full 256-bit shift is known.

Copilot uses AI. Check for mistakes.
Comment on lines +1914 to +1929
for (size_t I = 0; I < EVM_ELEMENTS_COUNT; ++I) {
MInstruction *R = Zero;
if (I >= CompShift) {
size_t SrcIdx = I - CompShift;
MInstruction *SrcVal = Value[SrcIdx];
MInstruction *Shifted = createInstruction<BinaryInstruction>(
false, OP_shl, MirI64Type, SrcVal,
createIntConstInstruction(MirI64Type, ShiftMod));
if (SrcIdx > 0 && RemainingBits > 0) {
MInstruction *Carry = createInstruction<BinaryInstruction>(
false, OP_ushr, MirI64Type, Value[SrcIdx - 1],
createIntConstInstruction(MirI64Type, RemainingBits));
R = createInstruction<BinaryInstruction>(false, OP_or, MirI64Type,
Shifted, Carry);
} else {
R = Shifted;
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the constant-SHL fast path, createIntConstInstruction(..., ShiftMod) and createIntConstInstruction(..., RemainingBits) are called inside the loop, creating a new OP_const each iteration even though the value is invariant. Hoist these constant instructions outside the loop (and consider skipping the shift/or entirely when ShiftMod == 0 by directly using the source limb) to keep the fast path from inflating MIR instruction count.

Suggested change
for (size_t I = 0; I < EVM_ELEMENTS_COUNT; ++I) {
MInstruction *R = Zero;
if (I >= CompShift) {
size_t SrcIdx = I - CompShift;
MInstruction *SrcVal = Value[SrcIdx];
MInstruction *Shifted = createInstruction<BinaryInstruction>(
false, OP_shl, MirI64Type, SrcVal,
createIntConstInstruction(MirI64Type, ShiftMod));
if (SrcIdx > 0 && RemainingBits > 0) {
MInstruction *Carry = createInstruction<BinaryInstruction>(
false, OP_ushr, MirI64Type, Value[SrcIdx - 1],
createIntConstInstruction(MirI64Type, RemainingBits));
R = createInstruction<BinaryInstruction>(false, OP_or, MirI64Type,
Shifted, Carry);
} else {
R = Shifted;
// Hoist loop-invariant constant instructions out of the limb loop.
MInstruction *ShiftModConst = nullptr;
MInstruction *RemainingBitsConst = nullptr;
if (ShiftMod != 0) {
ShiftModConst = createIntConstInstruction(MirI64Type, ShiftMod);
}
if (RemainingBits > 0) {
RemainingBitsConst =
createIntConstInstruction(MirI64Type, RemainingBits);
}
for (size_t I = 0; I < EVM_ELEMENTS_COUNT; ++I) {
MInstruction *R = Zero;
if (I >= CompShift) {
size_t SrcIdx = I - CompShift;
if (ShiftMod == 0) {
// Pure limb shift (multiple of 64): no intra-limb shift/carry needed.
R = Value[SrcIdx];
} else {
MInstruction *SrcVal = Value[SrcIdx];
MInstruction *Shifted = createInstruction<BinaryInstruction>(
false, OP_shl, MirI64Type, SrcVal, ShiftModConst);
if (SrcIdx > 0 && RemainingBitsConst) {
MInstruction *Carry = createInstruction<BinaryInstruction>(
false, OP_ushr, MirI64Type, Value[SrcIdx - 1],
RemainingBitsConst);
R = createInstruction<BinaryInstruction>(false, OP_or, MirI64Type,
Shifted, Carry);
} else {
R = Shifted;
}

Copilot uses AI. Check for mistakes.
Comment on lines +2084 to +2095
for (size_t I = 0; I < EVM_ELEMENTS_COUNT; ++I) {
MInstruction *R = Zero;
if (I + CompShift < EVM_ELEMENTS_COUNT) {
size_t SrcIdx = I + CompShift;
MInstruction *SrcVal = Value[SrcIdx];
MInstruction *Shifted = createInstruction<BinaryInstruction>(
false, OP_ushr, MirI64Type, SrcVal,
createIntConstInstruction(MirI64Type, ShiftMod));
if (SrcIdx + 1 < EVM_ELEMENTS_COUNT && CarryShift > 0) {
MInstruction *Carry = createInstruction<BinaryInstruction>(
false, OP_shl, MirI64Type, Value[SrcIdx + 1],
createIntConstInstruction(MirI64Type, CarryShift));
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the constant-SHR fast path, the shift amounts are materialized with createIntConstInstruction inside the per-limb loop, which produces duplicate OP_const instructions for ShiftMod/CarryShift. Hoist these constants outside the loop (and when ShiftMod == 0, avoid generating the shift/or at all by directly using the source limb) to maximize the intended perf win and reduce MIR bloat.

Suggested change
for (size_t I = 0; I < EVM_ELEMENTS_COUNT; ++I) {
MInstruction *R = Zero;
if (I + CompShift < EVM_ELEMENTS_COUNT) {
size_t SrcIdx = I + CompShift;
MInstruction *SrcVal = Value[SrcIdx];
MInstruction *Shifted = createInstruction<BinaryInstruction>(
false, OP_ushr, MirI64Type, SrcVal,
createIntConstInstruction(MirI64Type, ShiftMod));
if (SrcIdx + 1 < EVM_ELEMENTS_COUNT && CarryShift > 0) {
MInstruction *Carry = createInstruction<BinaryInstruction>(
false, OP_shl, MirI64Type, Value[SrcIdx + 1],
createIntConstInstruction(MirI64Type, CarryShift));
// If the shift is a multiple of 64, we only need to move whole limbs.
if (ShiftMod == 0) {
for (size_t I = 0; I < EVM_ELEMENTS_COUNT; ++I) {
MInstruction *R = Zero;
if (I + CompShift < EVM_ELEMENTS_COUNT) {
size_t SrcIdx = I + CompShift;
R = Value[SrcIdx];
}
Result[I] = protectUnsafeValue(R, MirI64Type);
}
return Result;
}
// Hoist loop-invariant shift constants out of the limb loop.
MInstruction *ShiftModConst =
createIntConstInstruction(MirI64Type, ShiftMod);
MInstruction *CarryShiftConst = nullptr;
if (CarryShift > 0)
CarryShiftConst = createIntConstInstruction(MirI64Type, CarryShift);
for (size_t I = 0; I < EVM_ELEMENTS_COUNT; ++I) {
MInstruction *R = Zero;
if (I + CompShift < EVM_ELEMENTS_COUNT) {
size_t SrcIdx = I + CompShift;
MInstruction *SrcVal = Value[SrcIdx];
MInstruction *Shifted = createInstruction<BinaryInstruction>(
false, OP_ushr, MirI64Type, SrcVal, ShiftModConst);
if (SrcIdx + 1 < EVM_ELEMENTS_COUNT && CarryShiftConst != nullptr) {
MInstruction *Carry = createInstruction<BinaryInstruction>(
false, OP_shl, MirI64Type, Value[SrcIdx + 1], CarryShiftConst);

Copilot uses AI. Check for mistakes.
Comment on lines +2264 to +2274
// Use arithmetic shift for the high component (contains sign bit)
bool UseArithShift = (SrcIdx == EVM_ELEMENTS_COUNT - 1);
MInstruction *Shifted = createInstruction<BinaryInstruction>(
false, UseArithShift ? OP_sshr : OP_ushr, MirI64Type, SrcVal,
createIntConstInstruction(MirI64Type, ShiftMod));
if (SrcIdx + 1 < EVM_ELEMENTS_COUNT && CarryShift > 0) {
MInstruction *Carry = createInstruction<BinaryInstruction>(
false, OP_shl, MirI64Type, Value[SrcIdx + 1],
createIntConstInstruction(MirI64Type, CarryShift));
R = createInstruction<BinaryInstruction>(false, OP_or, MirI64Type,
Shifted, Carry);
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the constant-SAR fast path, createIntConstInstruction(..., ShiftMod) / createIntConstInstruction(..., CarryShift) are created inside the loop even though they are loop-invariant, adding extra OP_const nodes. Hoist the constants outside the loop (and if ShiftMod == 0, avoid emitting a redundant shift-by-0 instruction) so this path stays minimal.

Copilot uses AI. Check for mistakes.
Comment on lines +1903 to +1935
// Fast path: constant shift amount — direct limb logic, no Select/cmp loops.
if (auto ShiftOpt = getConstShiftAmount(ShiftAmount)) {
uint64_t Shift = *ShiftOpt;
if (Shift >= 256) {
for (size_t I = 0; I < EVM_ELEMENTS_COUNT; ++I)
Result[I] = Zero;
return Result;
}
uint64_t CompShift = Shift / 64;
uint64_t ShiftMod = Shift % 64;
uint64_t RemainingBits = (ShiftMod == 0) ? 0 : (64 - ShiftMod);
for (size_t I = 0; I < EVM_ELEMENTS_COUNT; ++I) {
MInstruction *R = Zero;
if (I >= CompShift) {
size_t SrcIdx = I - CompShift;
MInstruction *SrcVal = Value[SrcIdx];
MInstruction *Shifted = createInstruction<BinaryInstruction>(
false, OP_shl, MirI64Type, SrcVal,
createIntConstInstruction(MirI64Type, ShiftMod));
if (SrcIdx > 0 && RemainingBits > 0) {
MInstruction *Carry = createInstruction<BinaryInstruction>(
false, OP_ushr, MirI64Type, Value[SrcIdx - 1],
createIntConstInstruction(MirI64Type, RemainingBits));
R = createInstruction<BinaryInstruction>(false, OP_or, MirI64Type,
Shifted, Carry);
} else {
R = Shifted;
}
}
Result[I] = protectUnsafeValue(R, MirI64Type);
}
return Result;
}
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constant-shift fast path derives Shift only from ShiftAmount (low 64 bits) and bypasses the IsLargeShift guard. This breaks EVM semantics when the 256-bit shift has any high limb set but Shift[0] < 256 (e.g. shift = 2^64 => ShiftAmount constant 0 while IsLargeShift should force the result to 0). Ensure the fast path still applies IsLargeShift (e.g., select between 0 and the computed result per limb) or move constant-shift handling up to handleShift() where all 4 limbs are available.

Suggested change
// Fast path: constant shift amount — direct limb logic, no Select/cmp loops.
if (auto ShiftOpt = getConstShiftAmount(ShiftAmount)) {
uint64_t Shift = *ShiftOpt;
if (Shift >= 256) {
for (size_t I = 0; I < EVM_ELEMENTS_COUNT; ++I)
Result[I] = Zero;
return Result;
}
uint64_t CompShift = Shift / 64;
uint64_t ShiftMod = Shift % 64;
uint64_t RemainingBits = (ShiftMod == 0) ? 0 : (64 - ShiftMod);
for (size_t I = 0; I < EVM_ELEMENTS_COUNT; ++I) {
MInstruction *R = Zero;
if (I >= CompShift) {
size_t SrcIdx = I - CompShift;
MInstruction *SrcVal = Value[SrcIdx];
MInstruction *Shifted = createInstruction<BinaryInstruction>(
false, OP_shl, MirI64Type, SrcVal,
createIntConstInstruction(MirI64Type, ShiftMod));
if (SrcIdx > 0 && RemainingBits > 0) {
MInstruction *Carry = createInstruction<BinaryInstruction>(
false, OP_ushr, MirI64Type, Value[SrcIdx - 1],
createIntConstInstruction(MirI64Type, RemainingBits));
R = createInstruction<BinaryInstruction>(false, OP_or, MirI64Type,
Shifted, Carry);
} else {
R = Shifted;
}
}
Result[I] = protectUnsafeValue(R, MirI64Type);
}
return Result;
}
// Note: We deliberately avoid a constant-shift fast path here because
// deriving the 256-bit shift solely from the low 64-bit ShiftAmount
// can bypass the IsLargeShift guard and break EVM semantics when any
// high limb of the shift value is non-zero. All shifts are handled
// by the generic implementation below, which correctly applies
// IsLargeShift to enforce zeroing for large shifts.

Copilot uses AI. Check for mistakes.
@starwarfan starwarfan marked this pull request as draft March 12, 2026 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants