Project-JEPA

A self-modifying compiler that learns from execution experience

⚠️ Project Status: Research Prototype

What Works ✅:

Hardware fingerprinting via NVML (512-bit genome)
Energy measurement system (simulation mode tested, real GPU mode functional)
Working memory cache (1024×768 embeddings)
Function profiling and energy ranking
Statistical validation framework (26 experiments, 780 measurements)

What's Aspirational 🎯:

Shadow execution (parallel original vs. replacement)
Automatic function replacement with optimized kernels
2-5× speedup demonstrations
Real CUDA kernel synthesis

Current Validation: Simulation mode only. Real GPU validation pending.

What is Project-JEPA?

Project-JEPA is an experimental system that treats code execution as a sensory experience. It creates an "artificial twin" that:

Watches your code execute (via energy sensors)
Learns which functions consume the most energy
Profiles different implementations to find the most efficient
Guides optimization decisions with actual energy measurements

The Core Idea

Traditional profiling measures time, not energy. But on modern heterogeneous hardware (CPU + GPU), a fast function might use more energy than a slower alternative. Project-JEPA optimizes for energy efficiency, not just speed.

How It Works

Your Code → Energy Sensor → Measure Execution → Learn Pattern
                                              ↓
                                      Cache Experience
                                              ↓
                                      Predict Energy
                                              ↓
                                      Identify Targets
                                              ↓
                                      Replace (Future)

Quick Start

Prerequisites

Go 1.21+
NVIDIA GPU with compute capability ≥ 7.0 (for real GPU mode)
CUDA 12.0+ (for real GPU mode)
Linux/WSL2 (Windows supported via WSL2)

Installation

# Clone repository
git clone https://github.com/caseywatts/project-jepa
cd project-jepa

# Install dependencies
go mod download

# Build all binaries
make build

# Birth the twin (simulation mode for testing)
./quickstart.sh --simulate

Your First Energy Measurement

package main

import (
    "fmt"
    "github.com/caseywatts/project-jepa/internal/sensors"
    "github.com/caseywatts/project-jepa/internal/hardware"
)

func main() {
    // Load twin
    genome, _ := hardware.Load(".twin/genesis.bin")
    sensor, _ := sensors.NewEnergySensor(0, genome, true)
    sensor.Calibrate()
    defer sensor.Close()

    // Profile your function
    measurement, _ := sensor.Measure(func() error {
        return myFunction()
    })

    fmt.Printf("Energy: %.2f mJ\n", measurement.Energy)
    fmt.Printf("Time: %v\n", measurement.Duration)
}

Documentation

Document	Audience	Description
QUICK_TEST_GUIDE.md	Users	Quick validation guide
TESTING_GUIDE.md	Users	Testing on your codebase
QUICK_REFERENCE.md	Users	System overview
TROUBLESHOOTING.md	Users	Common issues and fixes
DEVELOPER_GUIDE.md	Developers	Architecture and contribution
CONTRIBUTING.md	Contributors	Contribution guidelines
RESEARCH_PAPER.md	Researchers	Peer-reviewed paper (7,500 words)
MATHEMATICAL_FOUNDATIONS.md	Researchers	50+ equations and proofs
METHODOLOGY.md	Researchers	Experimental protocol
STATISTICAL_SUMMARY.md	Researchers	Statistical analysis
CLAUDE.md	AI Assistants	Project instructions for AI helpers

Experimental Results

What We've Validated ✅

26 experiments across 5 algorithmic categories:

Matrix Multiplication (16×16 to 128×128): Naive vs. cache-tiling
Sorting (1K to 10K elements): Quicksort, Mergesort, Heapsort
Memory Access (1M elements): Sequential, Random, Strided
Search (10K elements): Linear, Binary, Hash
Numerical (512 points): FFT, Newton-Raphson, Monte Carlo

Findings (Simulation Mode):

✅ System measures energy with <1% coefficient of variation
✅ Can rank algorithms by energy consumption
✅ Time-energy correlation: R² = 0.32 (time is a poor proxy for energy!)
⚠️ No significant energy differences found at tested scale (all ~2.0 mJ/op)
⚠️ Small effect sizes (Cohen's d < 0.3 for all comparisons)

Honest Assessment:

The simulation mode shows consistent energy measurements (~2.0 mJ/op) across all algorithms because synthetic energy doesn't capture real GPU power variations. Real GPU validation is needed to observe meaningful energy differences between algorithms.

What We Haven't Tested Yet ❓

Experiment	Status	Why It Matters
Real GPU energy (RTX 4050)	❌ Not done	Actual NVML measurements show real variance
Large-scale matrices (1024×1024)	❌ Not done	Larger problems amplify energy differences
CUDA kernels (cuBLAS vs. naive)	❌ Not done	Demonstrates 2-5× speedup potential
Shadow execution	❌ Not done	Validates replacement safety
Real workloads (image processing, ML)	❌ Not done	Practical value demonstration

Roadmap to Validation

Phase 1: Real GPU Mode (Current Priority)

# On your RTX 4050 laptop:
export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH
rm -rf .twin/
./build/project-jepa-genesis  # Real GPU, no --simulate flag
./build/experimental-suite       # Actual NVML measurements

Phase 2: Scale Up (1 week)

Test 1024×1024 matrix multiplication
Compare naive vs. cuBLAS
Expect: 50-90% energy savings

Phase 3: Shadow Execution (1 month)

Implement parallel execution
Validate correctness
Measure improvement

Architecture

System Components

┌─────────────────────────────────────────────────────────────┐
│                     Application Layer                        │
└────────────────────────────┬────────────────────────────────┘
                             │
┌────────────────────────────▼────────────────────────────────┐
│                   Project-JEPA Twin                       │
│  ┌──────────────┐  ┌─────────────┐  ┌──────────────┐      │
│  │   Hardware   │  │   Sensory   │  │    Memory    │      │
│  │   Genome     │  │   Layer     │  │    Cache     │      │
│  │  (512 bits)  │  │  (NVML)     │  │  (1024×768)  │      │
│  └──────────────┘  └─────────────┘  └──────────────┘      │
│         │                  │                  │            │
│         └──────────────────┼──────────────────┘            │
│                            ▼                                │
│                 ┌─────────────────────┐                    │
│                 │  Prediction Engine  │                    │
│                 └─────────────────────┘                    │
└────────────────────────────┬────────────────────────────────┘
                             │
                             ▼
                    ┌─────────────────┐
                    │ Optimization    │
                    │ Decision Engine │
                    └─────────────────┘

Key Modules

Module	File	Purpose
Genesis	`cmd/genesis/main.go`	Birth the twin via hardware fingerprinting
Energy Sensor	`internal/sensors/energy.go`	Measure energy consumption
Genome	`internal/hardware/genome.go`	512-bit hardware fingerprint
Cache	`internal/twin/cache.go`	Working memory (1024×768 embeddings)
Mirror Test	`cmd/mirror/main.go`	Self-prediction validation
Validation	`cmd/validate/main.go`	Practical benchmark

Development Status

✅ Implemented (v1.0)

Hardware fingerprinting (NVML API)
Energy measurement (simulation + real GPU)
Working memory cache (cosine similarity retrieval)
Baseline prediction model
Function profiling framework
Statistical validation suite (26 experiments)
Comprehensive documentation
WSL2 compatibility (go-nvml v0.13.0-1)

🚧 In Progress (v1.1)

Real GPU validation on RTX 4050
Large-scale experiments (1024×1024 matrices)
CUDA kernel comparison (cuBLAS vs. naive)
Improved prediction accuracy (target: <10% error)

🎯 Planned (v2.0)

❌ Not Started (v3.0+)

CPU energy profiling (RAPL)
Cross-architecture optimization
Multi-objective optimization (energy + latency + accuracy)
Distributed training
Commercial deployment

Use Cases

Use Case 1: Algorithm Selection

Problem: Should I use Quicksort or Mergesort?

Traditional: Test which is faster on your data.

Project-JEPA: Test which uses less energy on your data.

// Profile both
quickEnergy := profile(quicksort, testData)
mergeEnergy := profile(mergesort, testData)

// Choose based on energy, not just time
if mergeEnergy < quickEnergy {
    fmt.Println("Mergesort saves energy!")
}

Use Case 2: Library Comparison

Problem: Is NumPy faster than pure Python?

Project-JEPA: Measure energy efficiency, not just speed.

# Profile numpy array ops
numpy_energy = profile(lambda: np.dot(a, b))

# Profile pure Python list ops
python_energy = profile(lambda: matrix_multiply(a, b))

# Compare
print(f"NumPy energy: {numpy_energy:.2f} mJ")
print(f"Python energy: {python_energy:.2f} mJ")

Use Case 3: Hardware Upgrades

Problem: Will upgrading my GPU save energy?

Project-JEPA: Profile on both GPUs, compare energy efficiency.

# Profile on RTX 3060
./build/project-jepa-genesis --device=0
./build/experimental-suite > rtx3060_results.json

# Profile on RTX 4090
./build/project-jepa-genesis --device=1
./build/experimental-suite > rtx4090_results.json

# Compare energy efficiency
python compare_energy.py rtx3060_results.json rtx4090_results.json

Contributing

We welcome contributions! See DEVELOPER_GUIDE.md and CONTRIBUTING.md for details.

Areas for Contribution

Real GPU Validation: Test on your GPU, report results
New Algorithms: Add to experimental_suite.go
Language Bindings: Python, C++, Rust
Documentation: Improve guides, fix typos
Bug Reports: Issues with NVML, crashes, etc.

Contribution Process

Fork repository
Create branch: git checkout -b feature/my-feature
Make changes, add tests
Run make validate (format + test + build)
Commit with clear message
Push and create PR

Performance

Current Performance (Simulation Mode)

Operation	Energy (mJ/op)	Time (ns/op)	Throughput
Matrix (64×64)	2.02 ± 0.16	510,355	1,959 ops/sec
Sort (10K)	2.00 ± 0.16	~5,000,000	200 ops/sec
Search (10K)	2.03 ± 0.17	~50,000	20,000 ops/sec
Memory (1M)	2.02 ± 0.18	~930,000	1,075 ops/sec

Note: All values similar because simulation mode uses synthetic energy.

Target Performance (Real GPU - Not Yet Achieved)

Operation	Expected Energy	Expected Speedup
Naive Matrix (1024×1024)	500 mJ	Baseline
cuBLAS Matrix (1024×1024)	50 mJ	10× energy savings
Naive FFT (1024)	100 mJ	Baseline
cuFFT (1024)	10 mJ	10× energy savings

Status: These are hypotheses. Real GPU validation needed.

Citation

If you use Project-JEPA in your research, please cite:

@software{perception_jepa_2026,
  title = {Project-JEPA: A Self-Modifying Compiler that Learns from Execution Experience},
  author = {Watts, Casey},
  year = {2026},
  version = {1.0},
  url = {https://github.com/caseywatts/project-jepa},
  license = {MIT}
}

Acknowledgments

NVIDIA: For NVML API and CUDA toolkit
Go community: For excellent tooling and libraries
Research community: For prior work on energy-aware computing

License

MIT License - see LICENSE for details.

TL;DR: Free to use, modify, distribute. Attribution appreciated but not required.

FAQ

Q: Is this production-ready?

A: No. This is a research prototype. Production deployment requires:

Shadow execution validation
Extensive real-world testing
Safety certification
Legal review (for self-modifying code)

Timeline: 6-12 months before production-ready.

Q: Will this work on my GPU?

A: Probably. Requirements:

NVIDIA GPU
Compute capability ≥ 7.0 (RTX 20-series and newer)
CUDA 12.0+
Linux or WSL2

Check: Run nvidia-smi to verify GPU detection.

Q: Why no significant energy differences in experiments?

A: Two reasons:

Simulation mode: Uses synthetic energy, not real NVML measurements
Small scale: 128×128 matrices don't show large differences

Solution: Run real GPU experiments with larger inputs.

Q: Can I use this for [Python/C++/Rust]?

A: Not directly yet. Current implementation is Go-only. Future work:

C bindings via CGO
Python bindings via pybind11
RPC server for language-agnostic usage

Q: What's the difference between this and [AutoTVM/Halide]?

A: Key differences:

Learning: Online (experience) vs. offline (static analysis)
Objective: Energy vs. performance
Approach: Phenomenological (sensors) vs. analytical (models)

See research/RESEARCH_PAPER.md Section 7 for detailed comparison.

Q: How accurate are energy measurements?

A:

NVML specification: ±1% accuracy
Our validation: <1% coefficient of variation (consistent)
Real-world accuracy: Depends on GPU, workload, system load

Recommendation: Always average over multiple measurements (we use 30).

Contact

Issues: https://github.com/caseywatts/project-jepa/issues
Discussions: https://github.com/caseywatts/project-jepa/discussions
Email: casey@example.com

Star History

If you find this interesting, please give us a star! ⭐

Last Updated: January 7, 2026 Version: 1.0.0 Status: Research Prototype - Not Production Ready

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github		.github
cmd		cmd
internal		internal
research		research
.gitignore		.gitignore
CHARTER.md		CHARTER.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
DEVELOPER_GUIDE.md		DEVELOPER_GUIDE.md
DOCKSIDE-EXAM.md		DOCKSIDE-EXAM.md
LICENSE		LICENSE
Makefile		Makefile
PROGRESS_REPORT.md		PROGRESS_REPORT.md
QUICKSTART.md		QUICKSTART.md
QUICK_TEST_GUIDE.md		QUICK_TEST_GUIDE.md
README.md		README.md
SIMPLE_TRAINING_GUIDE.md		SIMPLE_TRAINING_GUIDE.md
TESTING_GUIDE.md		TESTING_GUIDE.md
TRAINING_SUCCESS_REPORT.md		TRAINING_SUCCESS_REPORT.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
WSL_SETUP.md		WSL_SETUP.md
go.mod		go.mod
go.sum		go.sum
quickstart.sh		quickstart.sh
test-compile.sh		test-compile.sh
train-on-real-gpu.sh		train-on-real-gpu.sh

Folders and files

Latest commit

History

Repository files navigation

Project-JEPA

⚠️ Project Status: Research Prototype

What is Project-JEPA?

The Core Idea

How It Works

Quick Start

Prerequisites

Installation

Your First Energy Measurement

Documentation

Experimental Results

What We've Validated ✅

What We Haven't Tested Yet ❓

Roadmap to Validation

Architecture

System Components

Key Modules

Development Status

✅ Implemented (v1.0)

🚧 In Progress (v1.1)

🎯 Planned (v2.0)

❌ Not Started (v3.0+)

Use Cases

Use Case 1: Algorithm Selection

Use Case 2: Library Comparison

Use Case 3: Hardware Upgrades

Contributing

Areas for Contribution

Contribution Process

Performance

Current Performance (Simulation Mode)

Target Performance (Real GPU - Not Yet Achieved)

Citation

Acknowledgments

License

FAQ

Q: Is this production-ready?

Q: Will this work on my GPU?

Q: Why no significant energy differences in experiments?

Q: Can I use this for [Python/C++/Rust]?

Q: What's the difference between this and [AutoTVM/Halide]?

Q: How accurate are energy measurements?

Contact

Star History

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages