Skip to content

feat: estimate cycles#17

Draft
not-matthias wants to merge 4 commits into
masterfrom
feat/cycle-estimation
Draft

feat: estimate cycles#17
not-matthias wants to merge 4 commits into
masterfrom
feat/cycle-estimation

Conversation

@not-matthias

Copy link
Copy Markdown
Member
  • feat: add cycle estimation based on LUT
  • fix(callgrind): make Capstone cycle-estimation build and link
  • chore: wip [skip ci]

@codspeed-hq

codspeed-hq Bot commented Jun 9, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

⚡ 1 improved benchmark
❌ 1 regressed benchmark
✅ 18 untouched benchmarks
⏩ 100 skipped benchmarks1

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
test_valgrind[valgrind-3.25.1, python3 testdata/test.py, full-with-inline] 6.9 s 210.9 s -96.72%
test_valgrind[valgrind-3.25.1, echo Hello, World!, full-with-inline] 612.6 s 19.4 s ×32

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing feat/cycle-estimation (22e0934) with master (fa9ee2e)

Open in CodSpeed

Footnotes

  1. 100 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Add callgrind/lut-gen/ (offline generators for the --cycle-estimation cost
tables) and the regenerated x86_caps_lut.inc / arm64_caps_lut.inc.

- lib/: sigkey.py (Python mirror of sigkey.h's packed-key contract) and
  gen_common.py (shared measurement parsing, collision collapse, emit).
- x86/gen_x86_lut.py: Zen4-tuned table from uops.info instructions.xml.
- arm64/: measured Cortex-A72 table from ocxtal/insn_bench_aarch64 via
  insn_bench_to_xml.py -> gen_arm64_lut.py -> merge_arm64_lut.py, with a
  hand-frozen guide supplement for ops insn_bench does not benchmark.
- test_lut.py: validates the committed tables against the runtime keying
  contract and asserts sigkey.py stays in lockstep with sigkey.h.
Add the --cycle-estimation=yes runtime: decode each guest instruction with
Capstone and look its reciprocal-throughput cost up in the generated cost
table (x86/arm64_caps_lut.inc), accumulating a per-instruction Cy event.

- cycledecode.c/.h: isolated Capstone decode + LUT lookup behind a plain-C
  ABI (no Valgrind tool headers, to avoid type clashes); freestanding libc
  shims since the tool links -nodefaultlibs.
- sigkey.h: packed (insn-id, signature) key contract shared with the offline
  LUT generators in lut-gen/.
- main.c/sim.c/bbcc.c/global.h/clo.c: wire Cy through callgrind (CLI option,
  EventSet, self + inclusive cost, fallback warning when an instruction has
  no table match).
- configure.ac/Makefile.am: build cycledecode against Capstone.

Emits only Cy (reciprocal throughput); no latency event.
@not-matthias not-matthias force-pushed the feat/cycle-estimation branch from 4b2e2b7 to 22e0934 Compare June 11, 2026 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant