Skip to content

feat(sync): GPU (torch/Metal/CUDA) and numba backends for all 9 metrics#264

Merged
Ramdam17 merged 1 commit intoppsp-team:masterfrom
Ramdam17:feat/gpu-numba-backends
Apr 17, 2026
Merged

feat(sync): GPU (torch/Metal/CUDA) and numba backends for all 9 metrics#264
Ramdam17 merged 1 commit intoppsp-team:masterfrom
Ramdam17:feat/gpu-numba-backends

Conversation

@Ramdam17
Copy link
Copy Markdown
Collaborator

Summary

Add hardware-accelerated backends to all 9 connectivity metrics in hypyp.sync:

  • numba JIT (prange): PLV, CCorr, Coh, ImCoh, PLI, wPLI, EnvCorr, PowCorr
  • PyTorch (MPS/CUDA/CPU, batched einsum): all 9 metrics
  • Metal compute shaders (Apple Silicon): PLI, wPLI, ACCorr
  • CUDA raw kernels (CuPy, float64): all 9 metrics

Backend Support Matrix

Metric numpy numba torch metal cuda_kernel
PLV x x x -- x
CCorr x x x -- x
Coh x x x -- x
ImCoh x x x -- x
EnvCorr x x x -- x
PowCorr x x x -- x
PLI x x x x x
wPLI x x x x x
ACCorr x x x x x

Benchmark Highlights

Benchmarked on Mac M4 Max (131 runs) and Narval A100 (111 runs). Key speedups at medium profile (64ch, 20ep, 5 freq bands):

Metric torch MPS vs numpy torch CUDA vs numpy
PLV 86x 138x
Coh 133x 139x
ImCoh 111x 157x
ACCorr 40x 191x

Full benchmark data: Ramdam17/hypyp-sync-benchmarks

optimization='auto' — Benchmark-Driven Dispatch

The new AUTO_PRIORITY table selects the best GPU backend per metric and platform:

  • MPS: torch for einsum metrics, Metal for sign-based + ACCorr
  • CUDA: cuda_kernel first (pairwise, OOM-safe at 512+ channels), torch fallback

CUDA kernels are kept for all metrics as a safety net — torch.cuda OOMs at ≥512 channels due to large intermediate tensors, while custom kernels compute pairwise without materializing the full (E,F,C,C,T) tensor.

New priority Parameter

Custom backend ordering per-call:

from hypyp.analyses import compute_sync
con = compute_sync(signal, 'plv', optimization='auto', priority=['torch', 'cuda_kernel'])

New Optional Dependencies

metal = ["pyobjc-framework-Metal>=10.0"]
cupy = ["cupy-cuda12x>=13.0.0"]

Breaking Changes

None beyond what was already introduced in the hypyp.sync module refactor (ACCorr shape change).

Test Plan

  • 83 tests (74 passed, 9 skipped — CUDA tests skip on non-NVIDIA machines)
  • All backends tested: numpy vs numba, numpy vs torch, numpy vs Metal, numpy vs CUDA
  • Tolerances: rtol=1e-9 for CPU/CUDA, rtol=1e-5 for MPS/Metal, rtol=1e-2 for sign-based on MPS
  • Graceful fallback: requesting unavailable backend → numpy with UserWarning
  • Auto-dispatch tests verify correct backend selection per platform
  • CI will run numpy + numba tests (no GPU in GitHub Actions)

🤖 Generated with Claude Code

Add hardware-accelerated backends to all connectivity metrics:
- numba JIT (prange): PLV, CCorr, Coh, ImCoh, PLI, wPLI, EnvCorr, PowCorr
- PyTorch MPS/CUDA/CPU (einsum): all 9 metrics
- Metal compute shaders: PLI, wPLI, ACCorr (Apple Silicon)
- CUDA raw kernels (CuPy): all 9 metrics (NVIDIA GPUs)

Benchmark-driven AUTO_PRIORITY compiled from Mac M4 Max (131 runs) and
Narval A100 (111 runs). The 'auto' optimization selects the best GPU
backend per metric and platform:
- MPS: torch for einsum metrics, Metal for sign-based + ACCorr
- CUDA: cuda_kernel first (OOM-safe at 512ch), torch as fallback

Add `priority` parameter on get_metric() and compute_sync() for custom
backend ordering.

New optional deps: pyobjc-framework-Metal, cupy-cuda12x.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@Ramdam17 Ramdam17 merged commit cd0a11d into ppsp-team:master Apr 17, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant