test(evm): add opcode microbenchmarks generation#382
test(evm): add opcode microbenchmarks generation#382starwarfan wants to merge 2 commits intoDTVMStack:mainfrom
Conversation
Made-with: Cursor
…ration Made-with: Cursor
b04216e to
acf8fe8
Compare
⚡ Performance Regression Check Results✅ Performance Check Passed (interpreter)Performance Benchmark Results (threshold: 20%)
Summary: 194 benchmarks, 0 regressions ✅ Performance Check Passed (multipass)Performance Benchmark Results (threshold: 20%)
Summary: 194 benchmarks, 0 regressions |
There was a problem hiding this comment.
Pull request overview
Adds infrastructure to generate and run EVM opcode microbenchmarks (and expands Solidity-based benchmark/test fixtures) to better track opcode-level performance regressions against libevmone, with CI wiring to execute the expanded benchmark suite.
Changes:
- Add opcode microbenchmark generator (
tools/generate_opcode_benchmarks.py) and broaden performance benchmark filtering defaults. - Extend Solidity test case support to include typed ABI arguments (
args) and add multiple new Solidity benchmark categories (DeFi, ERC20, NFT, DAO, Layer2). - Update benchmark/CI execution paths (new contract benchmark harness, CI script updates, interpreter/JIT behavior tweaks).
Reviewed changes
Copilot reviewed 35 out of 35 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/solc_batch_compile.sh | Compile all .sol files in each Solidity test directory into a single combined JSON artifact. |
| tools/generate_opcode_benchmarks.py | New Python generator producing state-test JSONs for opcode microbenchmarks. |
| tools/check_performance_regression.py | Adjust default Google Benchmark filter pattern for evmone-bench runs. |
| tests/evm_solidity/nft/test_cases.json | New NFT Solidity test/benchmark cases configuration. |
| tests/evm_solidity/nft/OnChainMetadataNFT.sol | New on-chain metadata NFT contract fixture. |
| tests/evm_solidity/nft/NFTWrapper.sol | New wrapper contract to exercise NFT flows. |
| tests/evm_solidity/nft/ERC721Enumerable.sol | New enumerable ERC721-like fixture for benchmarks/tests. |
| tests/evm_solidity/layer2/test_cases.json | New Layer2 Solidity test/benchmark cases configuration. |
| tests/evm_solidity/layer2/RollupState.sol | New rollup-state contract fixture. |
| tests/evm_solidity/layer2/MerkleProofVerifier.sol | New Merkle proof verifier fixture. |
| tests/evm_solidity/layer2/Layer2Wrapper.sol | New wrapper contract to exercise Layer2 flows. |
| tests/evm_solidity/erc20_bench/test_cases.json | New ERC20 benchmark cases, including typed args setup calls. |
| tests/evm_solidity/erc20_bench/PausableBurnableERC20.sol | New pausable/burnable ERC20-like fixture. |
| tests/evm_solidity/erc20_bench/FeeOnTransferERC20.sol | New fee-on-transfer ERC20-like fixture. |
| tests/evm_solidity/erc20_bench/ERC20BenchWrapper.sol | New wrapper contract to exercise ERC20 flows. |
| tests/evm_solidity/defi/test_cases.json | New DeFi Solidity test/benchmark cases configuration. |
| tests/evm_solidity/defi/SimpleDEX.sol | New simple DEX contract fixture. |
| tests/evm_solidity/defi/LendingPool.sol | New lending pool fixture. |
| tests/evm_solidity/defi/DeFiWrapper.sol | New wrapper contract to exercise DeFi flows. |
| tests/evm_solidity/dao/test_cases.json | New DAO Solidity test/benchmark cases configuration. |
| tests/evm_solidity/dao/SimpleGovernor.sol | New governor fixture contract. |
| tests/evm_solidity/dao/MultiSigWallet.sol | New multisig wallet fixture contract. |
| tests/evm_solidity/dao/DAOWrapper.sol | New wrapper contract to exercise DAO flows. |
| src/tests/solidity_test_helpers.h | Extend SolidityTestCase with typed ABI Args. |
| src/tests/solidity_test_helpers.cpp | Parse args from JSON and adjust calldata derivation behavior. |
| src/tests/solidity_contract_tests.cpp | Generate calldata from function + args and pre-resolve constructor addresses. |
| src/tests/evm_fallback_execution_tests.cpp | Update expected status code for a fallback-related test. |
| src/evm/interpreter.cpp | Remove undefined-opcode pre-check in one dispatch path (status behavior changes). |
| src/evm/evm_cache.cpp | Adjust gas chunk cost source used during cache build. |
| src/compiler/evm_frontend/evm_mir_compiler.cpp | Modify MLOAD lowering (remove pinning) and adjust gas reload for REVERT path. |
| src/compiler/evm_frontend/evm_imported.cpp | Only preserve return data on REVERT for CREATE; clear otherwise. |
| src/action/evm_bytecode_visitor.h | Adjust metering behavior for consecutive JUMPDEST runs. |
| benchmarks/evm_contract_benchmark.cpp | New Google Benchmark harness to run Solidity contract benchmarks via EVMC. |
| .github/workflows/dtvm_evm_test_x86.yml | Minor workflow formatting change. |
| .ci/run_test_suite.sh | Add JIT logging CMake opts in some modes; update benchmark invocation to include opcode benchmark dir. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def build_ternary_op(opcode: str, iterations: int) -> str: | ||
| """ | ||
| Template for opcodes that take 3 inputs and push 1 output (e.g. ADDMOD, MULMOD). | ||
| Setup: PUSH1 0x07 (Modulus), PUSH1 0x01 (Operand), PUSH1 0x01 (Accumulator). | ||
| Loop: DUP3 DUP3 <OP>. Duplicates the modulus and operand, applying <OP> on | ||
| (modulus, operand, accumulator). The result becomes the new accumulator. | ||
| """ | ||
| setup = OP_PUSH1 + "07" + OP_PUSH1 + "01" + OP_PUSH1 + "01" | ||
| loop_body = OP_DUP3 + OP_DUP3 + opcode | ||
| end = OP_PUSH1 + "00" + OP_MSTORE + OP_PUSH1 + "20" + OP_PUSH1 + "00" + OP_RETURN | ||
| return setup + (loop_body * iterations) + end |
There was a problem hiding this comment.
build_ternary_op() does not keep enough stack items alive for more than one iteration: after the first iteration, the opcode consumes 3 items and pushes 1, so the next loop iteration will hit a stack underflow when executing DUP3. The loop body needs to preserve the constant operand/modulus (e.g., by duplicating them and rearranging the stack so the opcode consumes the duplicates while the originals remain). As written, the generated ADDMOD/MULMOD benchmarks will be invalid bytecode for --iterations > 1 (including the default 10000).
| def build_memory_op_mstore(iterations: int) -> str: | ||
| """ | ||
| Template for MSTORE. | ||
| Setup: PUSH1 0x00 (Address), PUSH1 0x01 (Value). | ||
| Loop: DUP2 DUP2 MSTORE (duplicate addr and value, then store). | ||
| Note: MSTORE doesn't produce an output on stack, so we just keep DUPing. | ||
| Actually, DUP2 DUP2 MSTORE consumes 2 items and pushes 0, so DUP2 DUP2 perfectly offsets it. | ||
| To create a data dependency, we can increment the value: DUP2 DUP2 MSTORE PUSH1 0x01 ADD. | ||
| """ | ||
| setup = OP_PUSH1 + "00" + OP_PUSH1 + "00" # Addr, Value | ||
| loop_body = OP_DUP2 + OP_DUP2 + OP_MSTORE + OP_PUSH1 + "01" + OP_ADD | ||
| end = OP_PUSH1 + "00" + OP_MSTORE + OP_PUSH1 + "20" + OP_PUSH1 + "00" + OP_RETURN |
There was a problem hiding this comment.
build_memory_op_mstore()'s setup/comment are inconsistent and the initial stack order looks wrong for MSTORE. The docstring says value starts as 0x01, but the code pushes 00 twice, and for MSTORE the stack convention is [..., value, offset] (e.g. PUSH1 0x42; PUSH1 0x00; MSTORE). Please align the pushed constants with the intended (value, address) order and update the comment accordingly so the generated microbenchmark is doing the intended store pattern.
| // Currently, SolidityTestCase does not expose typed arguments; rely on raw calldata. | ||
| (void)Addrs; // unused until dynamic argument encoding is wired through SolidityTestCase | ||
| if (!Tc.Calldata.empty()) { | ||
| auto Opt = fromHex(Tc.Calldata); | ||
| if (Opt) return *Opt; | ||
| } |
There was a problem hiding this comment.
This benchmark harness currently ignores SolidityTestCase.Args / Function and only uses raw Calldata. With the new args support in test_cases.json (e.g., the setup_* ERC20 funding calls), Tc.Calldata will be empty and these setup calls will be silently skipped, changing benchmark behavior. BuildCalldata() should mirror the logic in src/tests/solidity_contract_tests.cpp by generating selector + ABI-encoded args when Tc.Args is non-empty.
| // Currently, SolidityTestCase does not expose typed arguments; rely on raw calldata. | |
| (void)Addrs; // unused until dynamic argument encoding is wired through SolidityTestCase | |
| if (!Tc.Calldata.empty()) { | |
| auto Opt = fromHex(Tc.Calldata); | |
| if (Opt) return *Opt; | |
| } | |
| // Prefer raw calldata when provided. | |
| (void)Addrs; // reserved for future use (e.g., resolving address arguments) | |
| if (!Tc.Calldata.empty()) { | |
| auto Opt = fromHex(Tc.Calldata); | |
| if (Opt) return *Opt; | |
| } | |
| // If no raw calldata is provided but a function selector and ABI-encoded args are, | |
| // build calldata as: selector || encoded_args. | |
| if (!Tc.Function.empty() && !Tc.Args.empty()) { | |
| std::string CalldataHex; | |
| CalldataHex.reserve(Tc.Function.size() + | |
| std::accumulate(Tc.Args.begin(), Tc.Args.end(), std::size_t{0}, | |
| [](std::size_t Acc, const auto& Arg) { | |
| return Acc + Arg.size(); | |
| })); | |
| CalldataHex += Tc.Function; | |
| for (const auto& Arg : Tc.Args) { | |
| CalldataHex += Arg; | |
| } | |
| auto Opt = fromHex(CalldataHex); | |
| if (Opt) return *Opt; | |
| } |
| // Currently, SolidityTestCase does not expose typed arguments; rely on raw calldata. | ||
| (void)Addrs; // unused until dynamic argument encoding is wired through SolidityTestCase |
There was a problem hiding this comment.
The comment in BuildCalldata() says SolidityTestCase does not expose typed arguments, but this PR adds SolidityTestCase::Args and the rest of the codebase now uses it to build calldata. Please update/remove this comment to avoid misleading future changes (and once args are supported here, the (void)Addrs workaround should no longer be needed).
| // Currently, SolidityTestCase does not expose typed arguments; rely on raw calldata. | |
| (void)Addrs; // unused until dynamic argument encoding is wired through SolidityTestCase | |
| // Currently this helper relies on raw calldata when provided. | |
| if (Addrs.size() == static_cast<size_t>(-1)) { | |
| // This block is never executed; Addrs is intentionally unused here for now. | |
| } |
| Byte OpcodeByte = Code[Frame->Pc]; | ||
| evmc_opcode Op = static_cast<evmc_opcode>(OpcodeByte); | ||
| const uint8_t OpcodeU8 = static_cast<uint8_t>(OpcodeByte); | ||
|
|
||
| if (NamesTable[OpcodeU8] == NULL) { | ||
| Context.setStatus(EVMC_UNDEFINED_INSTRUCTION); | ||
| if (handleExecutionStatus(Frame, Context)) { | ||
| return; | ||
| } | ||
| break; | ||
| } | ||
|
|
||
| switch (Op) { |
There was a problem hiding this comment.
After removing the NamesTable check, undefined opcodes now fall through the default case and set EVMC_INVALID_INSTRUCTION (unless they match PUSH/DUP/SWAP/CALL/CREATE ranges). This changes EVMC semantics: EVMC_INVALID_INSTRUCTION is meant for the INVALID (0xFE) opcode, while unknown opcodes like 0xEE should typically return EVMC_UNDEFINED_INSTRUCTION. Consider restoring an undefined-opcode check in this dispatch path (similar to the gas-chunk loop) so status codes are consistent and spec-aligned.
| @@ -2713,18 +2713,6 @@ EVMMirBuilder::handleMLoad(Operand AddrComponents) { | |||
|
|
|||
| Operand Bytes32Op(MemPtr, EVMType::BYTES32); | |||
| Operand Result = convertBytes32ToU256Operand(Bytes32Op); | |||
There was a problem hiding this comment.
handleMLoad() removed the protectUnsafeValue() pinning of the loaded 64-bit parts. Without this, backend optimization can potentially reorder or CSE the underlying loads across later memory-writing opcodes/calls, changing EVM semantics (observing newer memory contents than the MLOAD point-in-time). If that reordering risk still exists (as the removed comment described), the loads should remain protected (or be made explicitly non-movable/volatile) to preserve correctness.
| Operand Result = convertBytes32ToU256Operand(Bytes32Op); | |
| Operand Result = convertBytes32ToU256Operand(Bytes32Op); | |
| // Protect the loaded value from being reordered or CSE'd across later | |
| // memory-writing opcodes/calls to preserve EVM MLOAD semantics. | |
| protectUnsafeValue(Result); |
| echo "Saving performance baseline..." | ||
| python3 check_performance_regression.py \ | ||
| --save-baseline "$BENCHMARK_SAVE_BASELINE" \ | ||
| --save-baseline "$ ENCHMARK_SAVE_BASELINE" \ | ||
| --output-summary "$BENCHMARK_SUMMARY_FILE" \ | ||
| --lib ./libdtvmapi.so \ | ||
| --mode "$BENCHMARK_MODE" \ | ||
| --benchmark-dir test/evm-benchmarks/benchmarks | ||
| --benchmark-dir test/evm-benchmarks/benchmarks $OPCODE_BENCH_DIR |
There was a problem hiding this comment.
The --save-baseline argument is using a misspelled/garbled variable name ($ ENCHMARK_SAVE_BASELINE), which will expand to an empty string or a different variable and cause the baseline-save path to write to an invalid location (or fail argument parsing). Fix the variable reference to BENCHMARK_SAVE_BASELINE and remove the embedded spaces so the argument value is correct.
| python3 check_performance_regression.py \ | ||
| --baseline "$BASELINE_CACHE" \ | ||
| --threshold "$BENCHMARK_THRESHOLD" \ | ||
| --output-summary "$BENCHMARK_SUMMARY_FILE" \ | ||
| --lib ./libdtvmapi.so \ | ||
| --mode "$BENCHMARK_MODE" \ | ||
| --benchmark-dir test/evm-benchmarks/benchmarks | ||
| --benchmark-dir test/evm-benchmarks/benchmarks $OPCODE_BENCH_DIR | ||
| elif [ -n "$BENCHMARK_BASELINE_LIB" ]; then |
There was a problem hiding this comment.
check_performance_regression.py currently defines --benchmark-dir as a single string argument. This invocation appends $OPCODE_BENCH_DIR as an extra positional argument, which will be treated as an unknown CLI argument if it expands to a path (and will make the script exit with a usage error). Either update check_performance_regression.py to accept multiple benchmark directories (e.g., action='append' / nargs='+') and loop over them in run_benchmark(), or change $OPCODE_BENCH_DIR to include its own --benchmark-dir ... flag when set.
Summary
tools/generate_opcode_benchmarks.pyto programmatically generate EVM bytecodes for testing the performance of individual opcodes.tools/check_performance_regression.pyto allow specifying multiple benchmark directories and filters..ci/run_test_suite.shto invoke the generator and include the newtest/opcode-benchmarksdirectory in theevmone-benchsuite.This allows us to track performance regression of individual opcodes against
libevmone, while carefully using data-dependency chaining to defeat dead-code elimination in DTVM multipass JIT.Made with Cursor