Skip to content

btkhaled/volt-cpu

Repository files navigation

VOLT CPU ⚡ — Out-of-Order RISC-V Core Simulator + Genetic Algorithm Design Space Explorer

Rust RV64IM TAGE GA LOC MIT

VOLT is a cycle-approximate out-of-order RISC-V CPU simulator paired with a genetic algorithm that evolves Pareto-optimal microarchitectures. It executes real compiled binaries (CoreMark, Dhrystone) through a full OoO pipeline — rename, out-of-order issue with age-based select, TAGE branch prediction, store-to-load forwarding, and a three-level cache hierarchy — then feeds performance into analytical frequency/area/power models to score and evolve configurations.

Best config found: IPC 5.16 @ 4.71 GHz — 10.65 mm², 2.42 W on a 5 nm process with the Hybrid branch predictor.


✨ Features

Area What it does
Full OoO Pipeline Fetch → Rename (RAT + freelist) → Issue (age-based oldest-ready-first, broadcast wakeup) → Execute (ALU/load/store/branch) → Commit (in-order retirement).
Branch Predictors TAGE (4 tagged tables, geometric history 4/8/12/16, 3-bit + useful counters), Hybrid (Gshare + Bimodal with meta-predictor), Gshare (12-bit global history, XOR-indexed PHT).
Store-to-Load Forwarding Address-matched forwarding from the LSQ with size truncation. Loads stall behind unknown-address older stores (conservative memory ordering).
Cache Hierarchy L1I / L1D / L2 — set-associative, LRU, write-allocate, write-back. Configurable size, associativity, and latency per level.
RV64IM Execution Full RV64I base + M-extension (multiply/divide). Runs real ELF binaries — CoreMark, Dhrystone, or your own.
Analytical Models Frequency: FO4-based critical path across 7 pipeline stages with wire delay, PVT (1.30×), and clock jitter (1.05×). Area: per-structure SRAM/CAM/ALUs scaled by process node. Power: dynamic (αCV²f) + leakage.
Design Space Exploration 22 tunable parameters encoded in a 34-bit chromosome (~17 billion configurations). GA uses tournament selection, uniform crossover, and adaptive mutation to evolve Pareto-optimal CPUs.
Zero Heavy Dependencies Pure Rust — only rand + serde. No Python, no external simulators, no LLVM passes.

🚀 Quick Start

# Run the synthetic microbenchmark (6-chain ALU + branches + pointer-chase)
cargo run --release -- --synth

# Explore CPU designs with the GA (15 generations, 40 individuals, CoreMark eval)
cargo run --release -- --ga-full 15

# Run a RISC-V binary
cargo run --release -- --bin benchmarks/coremark/coremark.bin

# Show analytical frequency/area/power for the default config
cargo run --release -- --freq

# Run the built-in self-test
cargo run --release -- --self-test

# Trace instruction execution (first 50 instructions)
cargo run --release -- --trace --bin <path>

# Dump cycle-by-cycle state (first 50 cycles)
cargo run --release -- --dump-cycle --bin <path>

🏆 Results

Best Configuration Found

Metric Value
Name GA-8w-256r
IPC 5.16
Frequency 4.71 GHz
Area 10.65 mm²
Power 2.42 W
Process 5 nm
Branch Predictor Hybrid (Gshare + Bimodal)
Fitness 17.06

Pareto Frontier (Top 5)

IPC Freq (GHz) Area (mm²) Power (W) Fitness
5.16 4.71 10.65 2.42 17.06
5.16 4.71 10.81 2.42 16.95
5.16 4.67 10.59 2.41 16.93
5.16 4.71 10.88 2.46 16.91
5.16 4.67 10.77 2.51 16.81

Full results in Volt CPU Designs/report.md and Volt CPU Designs/ga_results.csv.


🔬 Design Space

22 tunable parameters, 34-bit chromosome, ~17 billion configurations:

Domain Parameters Bits
Frontend Fetch width (4/6/8), Decode width (4/6), Predictor (Gshare/TAGE/Hybrid), BTB entries (256/512/1024) 7
Backend ROB size (256–448), Issue queue (24–64), Int phys regs (192–384), ALUs (3–5), Load/Store units (2/3), Issue width (3–6) 13
Memory L1I/D size (32/64 KB), L1 assoc (4/8), L2 size (0/256K/512K/1M), MSHR (8/12), Load/Store queue sizes 10
Pipeline Depth (14/16/18 stages) 2
Process 5 nm / 7 nm / 10 nm 2

🏗️ Architecture at a Glance

 Cycle N                    Cycle N+1                    Cycle N+2
┌─────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌─────────┐
│  FETCH  │ → │  RENAME  │ → │  ISSUE   │ → │ EXECUTE  │ → │  COMMIT │
│ L1I +   │   │ RAT +    │   │ select() │   │ ALU/Ld/  │   │ arch RF │
│ BTB +   │   │ FreeList │   │ wakeup() │   │ St/Br    │   │ free PRF│
│ pred    │   │ + PRF    │   │ IQ scan  │   │ LSQ fwd  │   │ drain LSQ│
└─────────┘   └──────────┘   └──────────┘   └──────────┘   └─────────┘

The pipeline processes up to fetch_width instructions per cycle through all stages. Branch mispredictions trigger a full squash: ROB is walked backward freeing physical registers, RAT is restored, issue queue is drained, and fetch restarts at the correct target.


📚 Deep Dives

Doc What you'll find
ARCHITECTURE.md Full pipeline walkthrough, TAGE predictor internals, store-to-load forwarding, IPC wormhole fix
ANALYTICAL_MODELS.md FO4 frequency model by stage, area breakdown, power equations
DESIGN_SPACE_EXPLORATION.md Chromosome encoding, GA operators, fitness function, Pareto frontier analysis
TUTORIAL.md Step-by-step: build → run workloads → explore designs → interpret results

📁 Project Structure

src/
├── main.rs              # CLI entrypoint, GA orchestration
├── lib.rs               # Library root
├── volt/                # CPU simulator core
│   ├── cpu.rs           # Top-level pipeline (4-phase cycle)
│   ├── pipeline.rs      # Branch predictors: TAGE, Hybrid, Gshare, Bimodal
│   ├── ooo.rs           # PRF, RAT, FreeList, ROB, Issue Queue (wakeup/select)
│   ├── config.rs        # VoltConfig — all 22 microarchitectural parameters
│   ├── instr.rs         # Instruction opcodes and decoded representation
│   ├── isa.rs           # RV64IM decoder (~90 instructions)
│   ├── debug.rs         # Per-phase trace flags
│   ├── types.rs         # Shared types (Addr, Word, Exception, etc.)
│   ├── frontend/        # Fetch logic + BTB (currently inlined)
│   ├── backend/         # Execute unit dispatch (currently inlined)
│   ├── mem/             # Cache hierarchy + Load/Store Queue
│   │   ├── cache.rs     # Set-associative cache (L1I, L1D, L2)
│   │   └── lsq.rs       # LSQ with store-to-load forwarding
│   └── model/           # Analytical models
│       ├── freq_model.rs  # FO4 critical path → frequency
│       ├── area_model.rs  # Per-structure area estimation
│       └── power_model.rs # Dynamic + leakage power
├── ga/                  # Genetic algorithm
│   ├── chromosome.rs    # 34-bit encoding → VoltConfig decoding
│   ├── population.rs    # Individual + Population structs
│   ├── crossover.rs     # Uniform + single-point crossover
│   ├── mutation.rs      # Adaptive bit-flip mutation
│   ├── selection.rs     # Tournament + roulette selection
│   └── nsga2.rs         # Multi-objective Pareto frontier computation
├── workloads/           # Benchmark workloads
│   ├── coremark.rs      # CoreMark RISC-V binary runner (GA default)
│   ├── dhrystone.rs     # Dhrystone binary runner
│   └── synthetic.rs     # Synthetic microbenchmark (6-chain ALU + branches)
└── report/              # GA reporting
    ├── stats.rs         # Generation statistics
    ├── pareto.rs        # Pareto frontier printing
    └── plot.rs          # CSV export

📄 License

MIT — do what you want with it.


Built with Rust 🦀 and a lot of pipelined patience.

About

Cycle-approximate out-of-order RISC-V CPU simulator with TAGE branch prediction, store-to-load forwarding, and a genetic algorithm for design space exploration using real CoreMark execution.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors