PCA-based decorrelation + channel truncation + hybrid quantization + layer-adaptive precision for LLM inference.
ChaosEngine compresses the KV cache of large language models by 3.7x with 0.034 average attention output error, saving up to 13 GB of VRAM at 128K context length on an 8B model.
| Metric | FP16 (baseline) | ChaosEngine |
|---|---|---|
| Compression | 1.0x | 3.7x |
| KV cache @ 128K ctx | 18.0 GB | 4.9 GB |
| Avg attention error | 0.0 | 0.034 |
| Easy layer error | 0.0 | 0.013 |
| PCA decorrelation | -- | 100% |
| Adaptive precision | No | Yes (4 tiers) |
| Calibration time | -- | 32s |
Validated on RTX 4090 (CUDA) and M4 Max (MPS) with identical results. Cross-model tested on Qwen3-8B (3.7x) and Mistral 7B (2.6x).
- PCA achieves 100% decorrelation where Givens rotation achieves only 3% -- full covariance capture vs pairwise
- Key-only rotation outperforms rotating both K and V -- values have different correlation structure
- Bottom PCA channels can be truncated to zero on easy layers with no quality loss -- 62.5% key bit savings for free
- Value quantization dominates error on hard layers (75% of total) -- motivates PCA-hybrid value treatment
- Group-wise quantization after PCA is 3-5x worse than per-channel -- PCA sorts by variance, breaking group assumptions
ChaosEngine uses a four-tier layer-adaptive system based on per-layer quantization sensitivity:
| Tier | Layers | Keys | Values | Avg Bits |
|---|---|---|---|---|
| easy | L1-L12, L14, L19-20 | PCA top48@K4 + truncate | Uniform V4 | 2.75 |
| mid | L0, L13, L15, L17-18, L21 | PCA K4/V4 | Uniform V4 | 4.00 |
| mhard | L16, L22 | PCA top80@K8 + bot48@K4 | Uniform V4 | 5.25 |
| vhard | L23-L35 | PCA top96@K8 + bot32@K4 | PCA top48@V8 + bot80@V4 | 6.25 |
Layer assignments shown for Qwen3-8B. Automatically profiled per model.
# Clone
git clone https://github.com/cryptopoly/ChaosEngine.git
cd ChaosEngine
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # macOS/Linux
# .venv\Scripts\Activate.ps1 # Windows PowerShell
# Install PyTorch (pick one)
pip install torch # CPU / MPS (macOS)
pip install torch --index-url https://download.pytorch.org/whl/cu124 # CUDA 12.4
# Install ChaosEngine + dependencies
pip install -e ".[dev]"
pip install transformers accelerate safetensors
# Run tests
python -m pytest tests/ -v
# Run benchmark (auto-detects CUDA/MPS/CPU)
python benchmarks/bench_4090.py
# Run on a different model
python benchmarks/bench_4090.py --model mistralai/Mistral-7B-v0.3- Python 3.10-3.12 (3.13+ not yet supported by PyTorch CUDA)
- PyTorch 2.2+
- 16+ GB RAM (for 8B models)
- CUDA GPU recommended (MPS and CPU also supported)
chaos_engine/
config.py # Tier definitions, hyperparameters
calibration/ # PCA center computation, sensitivity profiling
scoring/ # Trigonometric importance, friendliness, triage
quantization/
pca_rotation.py # PCA decorrelation (100% off-diagonal removal)
whitening.py # PCA whitening + norm factoring + importance-weighted quant
scalar_quantize.py # 8/4/2-bit asymmetric per-channel quantization
pack_unpack.py # Bit packing for 4-bit and 2-bit storage
givens_rotation.py # Givens rotation (baseline comparison)
kernels/ # Triton fused attention kernels (Linux/CUDA)
cache/ # Mixed-precision KV cache data structures
integration/ # HuggingFace transformers + vLLM integration
benchmarks/
bench_4090.py # Main benchmark (works on CUDA, MPS, CPU)
bench_m4_v2.py # M4 Max specific benchmark
paper/
ChaosEngine_Technical_Report.pdf
generate_paper.py # Regenerate the PDF
tests/ # 66 unit + integration tests
pip install -e ".[dev]"
python -m pytest tests/ -v| Context Length | FP16 KV Cache | ChaosEngine | Saved |
|---|---|---|---|
| 4,096 | 0.6 GB | 0.2 GB | 0.4 GB |
| 8,192 | 1.1 GB | 0.3 GB | 0.8 GB |
| 32,768 | 4.5 GB | 1.2 GB | 3.3 GB |
| 65,536 | 9.0 GB | 2.5 GB | 6.5 GB |
| 131,072 | 18.0 GB | 4.9 GB | 13.1 GB |
Apache 2.0
@techreport{chaosengine2026,
title={ChaosEngine: Spectral Triage KV Cache Compression via PCA Truncation and Layer-Adaptive Hybrid Quantization},
year={2026},
url={https://github.com/cryptopoly/ChaosEngine}
}