VOLT is a cycle-approximate out-of-order RISC-V CPU simulator paired with a genetic algorithm that evolves Pareto-optimal microarchitectures. It executes real compiled binaries (CoreMark, Dhrystone) through a full OoO pipeline — rename, out-of-order issue with age-based select, TAGE branch prediction, store-to-load forwarding, and a three-level cache hierarchy — then feeds performance into analytical frequency/area/power models to score and evolve configurations.
Best config found: IPC 5.16 @ 4.71 GHz — 10.65 mm², 2.42 W on a 5 nm process with the Hybrid branch predictor.
| Area | What it does |
|---|---|
| Full OoO Pipeline | Fetch → Rename (RAT + freelist) → Issue (age-based oldest-ready-first, broadcast wakeup) → Execute (ALU/load/store/branch) → Commit (in-order retirement). |
| Branch Predictors | TAGE (4 tagged tables, geometric history 4/8/12/16, 3-bit + useful counters), Hybrid (Gshare + Bimodal with meta-predictor), Gshare (12-bit global history, XOR-indexed PHT). |
| Store-to-Load Forwarding | Address-matched forwarding from the LSQ with size truncation. Loads stall behind unknown-address older stores (conservative memory ordering). |
| Cache Hierarchy | L1I / L1D / L2 — set-associative, LRU, write-allocate, write-back. Configurable size, associativity, and latency per level. |
| RV64IM Execution | Full RV64I base + M-extension (multiply/divide). Runs real ELF binaries — CoreMark, Dhrystone, or your own. |
| Analytical Models | Frequency: FO4-based critical path across 7 pipeline stages with wire delay, PVT (1.30×), and clock jitter (1.05×). Area: per-structure SRAM/CAM/ALUs scaled by process node. Power: dynamic (αCV²f) + leakage. |
| Design Space Exploration | 22 tunable parameters encoded in a 34-bit chromosome (~17 billion configurations). GA uses tournament selection, uniform crossover, and adaptive mutation to evolve Pareto-optimal CPUs. |
| Zero Heavy Dependencies | Pure Rust — only rand + serde. No Python, no external simulators, no LLVM passes. |
# Run the synthetic microbenchmark (6-chain ALU + branches + pointer-chase)
cargo run --release -- --synth
# Explore CPU designs with the GA (15 generations, 40 individuals, CoreMark eval)
cargo run --release -- --ga-full 15
# Run a RISC-V binary
cargo run --release -- --bin benchmarks/coremark/coremark.bin
# Show analytical frequency/area/power for the default config
cargo run --release -- --freq
# Run the built-in self-test
cargo run --release -- --self-test
# Trace instruction execution (first 50 instructions)
cargo run --release -- --trace --bin <path>
# Dump cycle-by-cycle state (first 50 cycles)
cargo run --release -- --dump-cycle --bin <path>| Metric | Value |
|---|---|
| Name | GA-8w-256r |
| IPC | 5.16 |
| Frequency | 4.71 GHz |
| Area | 10.65 mm² |
| Power | 2.42 W |
| Process | 5 nm |
| Branch Predictor | Hybrid (Gshare + Bimodal) |
| Fitness | 17.06 |
| IPC | Freq (GHz) | Area (mm²) | Power (W) | Fitness |
|---|---|---|---|---|
| 5.16 | 4.71 | 10.65 | 2.42 | 17.06 |
| 5.16 | 4.71 | 10.81 | 2.42 | 16.95 |
| 5.16 | 4.67 | 10.59 | 2.41 | 16.93 |
| 5.16 | 4.71 | 10.88 | 2.46 | 16.91 |
| 5.16 | 4.67 | 10.77 | 2.51 | 16.81 |
Full results in Volt CPU Designs/report.md and Volt CPU Designs/ga_results.csv.
22 tunable parameters, 34-bit chromosome, ~17 billion configurations:
| Domain | Parameters | Bits |
|---|---|---|
| Frontend | Fetch width (4/6/8), Decode width (4/6), Predictor (Gshare/TAGE/Hybrid), BTB entries (256/512/1024) | 7 |
| Backend | ROB size (256–448), Issue queue (24–64), Int phys regs (192–384), ALUs (3–5), Load/Store units (2/3), Issue width (3–6) | 13 |
| Memory | L1I/D size (32/64 KB), L1 assoc (4/8), L2 size (0/256K/512K/1M), MSHR (8/12), Load/Store queue sizes | 10 |
| Pipeline | Depth (14/16/18 stages) | 2 |
| Process | 5 nm / 7 nm / 10 nm | 2 |
Cycle N Cycle N+1 Cycle N+2
┌─────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐
│ FETCH │ → │ RENAME │ → │ ISSUE │ → │ EXECUTE │ → │ COMMIT │
│ L1I + │ │ RAT + │ │ select() │ │ ALU/Ld/ │ │ arch RF │
│ BTB + │ │ FreeList │ │ wakeup() │ │ St/Br │ │ free PRF│
│ pred │ │ + PRF │ │ IQ scan │ │ LSQ fwd │ │ drain LSQ│
└─────────┘ └──────────┘ └──────────┘ └──────────┘ └─────────┘
The pipeline processes up to fetch_width instructions per cycle through all stages. Branch mispredictions trigger a full squash: ROB is walked backward freeing physical registers, RAT is restored, issue queue is drained, and fetch restarts at the correct target.
| Doc | What you'll find |
|---|---|
ARCHITECTURE.md |
Full pipeline walkthrough, TAGE predictor internals, store-to-load forwarding, IPC wormhole fix |
ANALYTICAL_MODELS.md |
FO4 frequency model by stage, area breakdown, power equations |
DESIGN_SPACE_EXPLORATION.md |
Chromosome encoding, GA operators, fitness function, Pareto frontier analysis |
TUTORIAL.md |
Step-by-step: build → run workloads → explore designs → interpret results |
src/
├── main.rs # CLI entrypoint, GA orchestration
├── lib.rs # Library root
├── volt/ # CPU simulator core
│ ├── cpu.rs # Top-level pipeline (4-phase cycle)
│ ├── pipeline.rs # Branch predictors: TAGE, Hybrid, Gshare, Bimodal
│ ├── ooo.rs # PRF, RAT, FreeList, ROB, Issue Queue (wakeup/select)
│ ├── config.rs # VoltConfig — all 22 microarchitectural parameters
│ ├── instr.rs # Instruction opcodes and decoded representation
│ ├── isa.rs # RV64IM decoder (~90 instructions)
│ ├── debug.rs # Per-phase trace flags
│ ├── types.rs # Shared types (Addr, Word, Exception, etc.)
│ ├── frontend/ # Fetch logic + BTB (currently inlined)
│ ├── backend/ # Execute unit dispatch (currently inlined)
│ ├── mem/ # Cache hierarchy + Load/Store Queue
│ │ ├── cache.rs # Set-associative cache (L1I, L1D, L2)
│ │ └── lsq.rs # LSQ with store-to-load forwarding
│ └── model/ # Analytical models
│ ├── freq_model.rs # FO4 critical path → frequency
│ ├── area_model.rs # Per-structure area estimation
│ └── power_model.rs # Dynamic + leakage power
├── ga/ # Genetic algorithm
│ ├── chromosome.rs # 34-bit encoding → VoltConfig decoding
│ ├── population.rs # Individual + Population structs
│ ├── crossover.rs # Uniform + single-point crossover
│ ├── mutation.rs # Adaptive bit-flip mutation
│ ├── selection.rs # Tournament + roulette selection
│ └── nsga2.rs # Multi-objective Pareto frontier computation
├── workloads/ # Benchmark workloads
│ ├── coremark.rs # CoreMark RISC-V binary runner (GA default)
│ ├── dhrystone.rs # Dhrystone binary runner
│ └── synthetic.rs # Synthetic microbenchmark (6-chain ALU + branches)
└── report/ # GA reporting
├── stats.rs # Generation statistics
├── pareto.rs # Pareto frontier printing
└── plot.rs # CSV export
MIT — do what you want with it.
Built with Rust 🦀 and a lot of pipelined patience.