feat(loader): support stock-recipe (Q8_0/F32) abliterated GGUFs end-to-end on Metal by audreyt · Pull Request #60 · antirez/ds4

audreyt · 2026-05-10T12:30:37Z

What this changes

DeepSeek-V4-Flash GGUFs produced by the upstream llama.cpp converter without
per-tensor type overrides ship most of the small projections at Q8_0 (and the
routed-expert router at F32) where the antirez recipe keeps them at F16.

Examples: the cyberneurova/CyberNeurova-DeepSeek-V4-Flash-abliterated-GGUF
models. On stock ds4 main these load fails at the first F16-strict
validator (token_embd, then output_hc_fn, then hc_attn_fn, …), and even
after the validators are relaxed, several Metal kernel paths read weight bytes
directly via offset arithmetic that hard-codes F16/F32 strides and produce
silently wrong output for Q8_0.

This PR makes the embed / HC / compressor / indexer / router validators and
the corresponding Metal kernel paths polymorphic, so the same GGUF loads and
runs on Metal end-to-end on audreyt/pi-ds4.

Validators (`ds4.c`)

New tensor_expect_dispatch_layout helper accepts F16, F32, or Q8_0 and is
applied to every projection that flows through a type-dispatching
matvec/matmul: output_hc_fn, hc_attn_fn, hc_ffn_fn,
attn_compressor_{ape,gate,kv}, indexer.{attn_q_b,proj},
indexer_compressor_{ape,gate,kv}, ffn_gate_inp.
token_embd keeps its own inline F16/Q8_0 check because its CPU embed
kernel doesn't go through matvec_any.
The two compressor decode-time guards (attn_compressor and
indexer_compressor pair-projection paths) relaxed from "F16 only" to
"F16 or Q8_0, paired type must match".

CPU paths (`ds4.c`)

Refactor embed_token_f16 into an embed_token dispatcher; add
embed_token_q8_0 (block-wise dequant of block_q8_0).
Replace the remaining direct matvec_f16 / matvec_f16_serial callers
(HC fn, output_hc_fn, ffn_gate_inp) with the existing matvec_any
dispatcher; add matvec_any_serial for the HC pre/post path.
Polymorphic Metal-side dispatch helpers metal_graph_matmul_plain_tensor
and metal_graph_matmul_pair_plain_tensor (extended for Q8_0; the pair
fuses with the existing F16-pair kernel when both tensors are F16,
otherwise dispatches to two single matmuls). All 22 hardcoded
ds4_metal_matmul_f16{,_pair}_tensor call sites in ds4.c (HC mix,
attn/indexer compressors, indexer projections, output head, router)
converted to use these wrappers.

Metal kernels

metal/get_rows.metal: kernel_get_rows_q8_0 (one float per thread,
dequantizes its source block on the fly).
metal/dense.metal: kernel_mul_mm_f32_f32 template instantiation for the
multi-token F32 weight matmul that the F32 router path needs in prefill
(mirrors the existing F16/Q8_0 mul_mm_t instantiations).
metal/dsv4_kv.metal: a Q8_0 branch added to
kernel_dsv4_compressor_store_one. Without this, the decode-time
single-row compressor store treats Q8_0 ape as F32 and reads garbage.

Metal wiring (`ds4_metal.m`)

Register g_get_rows_q8_0_pipeline at init; clear at cleanup.
Both ds4_metal_embed_{token,tokens}_hc_tensor and the shared
ds4_metal_encode_get_rows helper take a new weight_type parameter
(GGUF type code: 1=F16, 8=Q8_0). 8 callers in ds4.c forward
weights->token_embd->type unchanged. ds4_metal_embed_row_layout picks
the right per-row stride and pipeline.
ds4_metal_matmul_f32_tensor extended with a multi-token branch that
dispatches to kernel_mul_mm_f32_f32 (n_tok > 1); existing n_tok = 1
path unchanged.
ds4_metal_encode_compressor_score_with_ape and the equivalent loop in
ds4_metal_compressor_store_batch_tensor: for Q8_0 ape, dequantize on the
CPU into a per-call private MTLBuffer and feed that into the existing
add_f32_1d. Two new helpers (ds4_metal_half_bits_to_float,
ds4_metal_cpu_dequant_q8_0_rows) implement the conversion; the CPU
dequant matches gguf-py's dequantize reference byte-for-byte (verified
in a standalone numeric check). A per-call buffer is required because
multiple CPU writes to the previously-shared
g_compressor_store_ape_buffer within one command buffer collapse to the
last write at execute time (Metal kernels run in encode order, but CPU
writes don't participate in that ordering when the same scratch is
reused). The per-call buffer is retained until cb completion via
addCompletedHandler because Metal does not strongly retain buffers
bound to encoders.
Six ape_type validators relaxed to also accept 8 (Q8_0).
Six ape_bytes calculations centralized through a new
ds4_metal_ape_bytes(ape_type, n_elems) helper that returns the correct
stride for F16 / F32 / Q8_0.
metal_graph_matmul_plain_tensor extended with a Q8_0 branch.

Why CPU dequant for Q8_0 ape (and not a Metal kernel)

I first wrote a kernel_cpy_q8_0_f32 Metal kernel using the same
block_q8_0 indexing pattern that the working dense Q8_0 matvec/matmul
kernels in metal/dense.metal use. It compiled cleanly but produced
silently wrong values for the actual compressor APE shapes (4 rows × 1024
cols of block_q8_0). I confirmed this by a side-by-side numeric check
against gguf-py's dequantize reference: my CPU dequant matches
byte-for-byte; the Metal kernel does not. I left kernel_cpy_q8_0_f32 in
metal/cpy.metal (its registration in ds4_metal.m is harmless) so that a
future debug session can pick it up; the compressor paths use the CPU
dequant as the active route.

What this PR does not cover

The CPU MoE path (ds4.c:5198, 5291) still hardcodes
if (gate_w->type != IQ2_XXS) ds4_die(...). That path is reference/debug
per AGENT.md and the production Metal flow doesn't touch it. If
something forces CPU fallback (Metal disabled, MTP without Metal, certain
trace modes) on a stock-recipe Q8_0 GGUF you'll see "expected IQ2_XXS
expert tensors" and need a Q8_0 dispatch added there too. Out of scope
for this PR; the production Metal flow is fine.
No quantization changes, no recipe changes, no new GGUF formats. This is
a loader/dispatcher change so existing GGUFs that happen to use the
stock recipe become loadable.

Test matrix (macOS / M5/ Metal)

make ds4-server clean (one pre-existing -Wpointer-sign warning from
the unrelated MoE path, not introduced by this PR).
Cyberneurova Q2_K GGUF entirely unmodified, default flags:
21-token prompt → coherent generation
("An LLM, or Large Language Model, is a type of artificial intelligence").
Without the compressor APE fix, this prompt generated a few coherent
tokens then <BOS> token spam.
Pre-harmonized variant (token_embd / HC / compressor / indexer all
F16): still works byte-for-byte the same as before, no F16/F32 path
regressions.
make ds4-server build clean across both branches.

DeepSeek-V4-Flash GGUFs produced by the upstream llama.cpp converter without per-tensor type overrides ship most of the small projections at Q8_0 (and routed-expert router weights at F32) where the antirez recipe keeps them at F16. Examples include the cyberneurova abliterated GGUFs. On stock ds4 main these load fails loudly at the first F16-strict validator (token_embd, then output_hc_fn, then hc_attn_fn, ...), and even after the validators are relaxed, several Metal kernel paths read weight bytes directly via offset arithmetic that hard-codes F16/F32 strides. This change makes the embed/HC/compressor/indexer/router validators *and* the corresponding Metal kernel paths polymorphic, so the same GGUF loads and runs with no harmonizer step. Validators (ds4.c): * New tensor_expect_dispatch_layout helper accepts F16, F32, or Q8_0 and is applied to every projection that flows through a type-dispatching matvec/matmul: output_hc_fn, hc_attn_fn, hc_ffn_fn, attn_compressor_{ape,gate,kv}, indexer.{attn_q_b,proj}, indexer_compressor_{ape,gate,kv}, ffn_gate_inp. * token_embd keeps its own inline F16/Q8_0 check because its CPU embed kernel doesn't go through matvec_any. * Two compressor decode-time guards (attn_compressor and indexer_compressor pair-projection paths) relaxed from "F16 only" to "F16 or Q8_0, paired type must match". CPU paths (ds4.c): * Refactor embed_token_f16 into an embed_token dispatcher; add embed_token_q8_0 (block-wise dequant of block_q8_0). * Replace the remaining direct matvec_f16 / matvec_f16_serial callers (HC fn, output_hc_fn, ffn_gate_inp) with the existing matvec_any dispatcher; add matvec_any_serial for the HC pre/post path. * Polymorphic Metal-side dispatch helpers metal_graph_matmul_plain_tensor and metal_graph_matmul_pair_plain_tensor (extended for Q8_0; the pair fuses with the existing F16-pair kernel when both tensors are F16, otherwise dispatches to two single matmuls). All 22 hardcoded ds4_metal_matmul_f16{,_pair}_tensor call sites in ds4.c (HC mix, attn/indexer compressors, indexer projections, output head, router) converted to use these wrappers. Metal kernels: * metal/get_rows.metal: kernel_get_rows_q8_0 (one float per thread, dequantizes its source block on the fly). * metal/dense.metal: kernel_mul_mm_f32_f32 template instantiation for the multi-token F32 weight matmul that the F32 router path needs in prefill (mirrors the existing F16/Q8_0 mul_mm_t instantiations). * metal/cpy.metal: kernel_cpy_q8_0_f32 (dequantizing 1D copy used by the compressor APE byte-strided reader). Metal wiring (ds4_metal.m): * Register g_get_rows_q8_0_pipeline and g_cpy_q8_0_f32_pipeline at init; clear them at cleanup. * Both ds4_metal_embed_{token,tokens}_hc_tensor and the shared ds4_metal_encode_get_rows helper take a new weight_type parameter (GGUF type code: 1=F16, 8=Q8_0). 8 callers in ds4.c forward weights->token_embd->type unchanged. ds4_metal_embed_row_layout picks the right per-row stride and pipeline. * ds4_metal_matmul_f32_tensor extended with a multi-token branch that dispatches to kernel_mul_mm_f32_f32 (n_tok > 1); existing n_tok = 1 path unchanged. * ds4_metal_encode_compressor_score_with_ape and the equivalent loop in ds4_metal_compressor_prefill_tensor add a Q8_0 branch (ds4_metal_encode_cpy_q8_0_f32_1d) and use a per-row stride that accounts for the block_q8_0 layout. * Six ape_type validators relaxed to also accept 8 (Q8_0). * Six ape_bytes calculations centralized through a new ds4_metal_ape_bytes(ape_type, n_elems) helper that returns the correct stride for F16/F32/Q8_0. * metal_graph_matmul_plain_tensor extended with a Q8_0 branch. Tested on macOS / M-series / Metal: * make ds4-server clean (no new warnings). * Cyberneurova Q2_K GGUF entirely unmodified: loads, prefill + decode through to coherent generation ("PASS" returned for the "reply with the single word PASS" prompt). * Pre-harmonized variant (token_embd / hc / compressor / indexer all F16, ffn_gate_inp F16): still works byte-for-byte the same as before this change, no F16 path regressions. Caveat for reviewers running ivanfioravanti's M5 PR (antirez#15) on top of this: the unmodified cyberneurova file generates garbage (BOS spam) when MPP F16 prefill is engaged, but produces coherent output with DS4_METAL_MPP_F16_DISABLE=1. The garbage is reproducible from antirez#15's MPP path alone and is independent of the changes here; it surfaces only because this PR makes the Q8_0 file loadable in the first place.

This PR's loader changes accept Q8_0 `*compressor_ape*` weights at the validator level, but two follow-on Metal paths still treat them as F16 (or fall through to F32) and produce silently wrong output, which shows up as <BOS>-token spam in generation for any prompt long enough to exercise the multi-token compressor path on M-series hardware. 1. `kernel_cpy_q8_0_f32` (added in this PR for the prefill APE byte-strided dequant) compiles cleanly and follows the same block_q8_0 indexing pattern used by other working Q8_0 kernels in dense.metal, but emits silently wrong values for the actual ape shapes (4 rows x 1024 cols of block_q8_0). Confirmed by isolating the kernel: a CPU-side dequant of the same byte region matches gguf-py's `dequantize` reference byte-for-byte, while the Metal kernel's output is wrong. 2. `kernel_dsv4_compressor_store_one` (decode-time single-row store in metal/dsv4_kv.metal): only handled `ape_type == 1` (F16) and fell through to F32 for everything else, so Q8_0 ape was reading garbage at decode time. Fix: * Replace the prefill APE Q8_0 path in `ds4_metal_encode_compressor_score_with_ape` and `ds4_metal_compressor_store_batch_tensor` with a CPU-side dequant via two new helpers (`ds4_metal_half_bits_to_float` and `ds4_metal_cpu_dequant_q8_0_rows`) into a *per-call* private MTLBuffer. A per-call buffer is required because multiple CPU writes to the previously-shared `g_compressor_store_ape_buffer` within one command buffer collapse to the last write at execute time (Metal kernels run in encode order, but CPU writes don't participate in that ordering when the same scratch is reused). The per-call buffer is retained until cb completion via `addCompletedHandler` because Metal does not strongly retain buffers bound to encoders. * Add a Q8_0 branch to `kernel_dsv4_compressor_store_one` that walks block_q8_0 layout (uint16_t scale + 32 int8 quants per 34-byte block) inline. The buggy `kernel_cpy_q8_0_f32` Metal kernel is left in place but is no longer reached from the compressor paths; its registration in ds4_metal.m is harmless and a future debug session can either fix it or drop it. Tested on macOS / M-series / Metal: * make ds4-server clean (one pre-existing -Wpointer-sign warning from the unrelated MoE path). * Cyberneurova Q2_K GGUF entirely unmodified, default flags: 21-token prompt -> coherent generation ("An LLM, or Large Language Model, is a type of artificial intelligence"). Previously this prompt generated a few coherent tokens then <BOS> token spam. * Pre-harmonized variant (token_embd / hc / compressor / indexer all F16): still works byte-for-byte the same as before this fix; no F16 / F32 path regressions.

The decode-time indexer code at metal_graph_encode_decode_layer (ds4.c:9082-9095) still has two F16-only validators on indexer_attn_q_b and indexer_proj that I missed in the initial loader pass. These validators only fire after `g->layer_n_comp[il] > decode_top_k` — i.e. once the compressor has accumulated more rows than the decode-time top-k. For short generations the path isn't reached; for ~400+ token generations on stock-recipe (Q8_0) GGUFs the validator trips and the request finishes with finish_reason="error" / "Metal decode failed". The downstream calls already use metal_graph_matmul_plain_tensor (which dispatches to ds4_metal_matmul_q8_0_tensor for Q8_0). The loader-time validator at line 2211-2212 already uses tensor_expect_dispatch_layout, which accepts F16/F32/Q8_0. Only these runtime guards were stuck on F16. Reproducer (cyberneurova Q2_K, default flags): a "write a long story" prompt that generates ~800 tokens hits the validator after ~400 tokens and the request errors out. After this fix, the same prompt streams 800+ tokens cleanly.

audreyt · 2026-05-10T18:25:07Z

ds4: Metal graph indexer q projection expects F16 weights

Fixed in c2144e5!

The two callers of ds4_metal_encode_cpy_q8_0_f32_1d were removed in 79b08bb (switched to CPU-side dequant to avoid an encode-time race on the shared compressor scratch buffer), leaving the function unused and tripping -Wunused-function on stock Make builds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

audreyt · 2026-05-10T18:51:30Z

There is a small likely cosmetic warning while compiling:

Fixed and synced from main. Ready for review from @antirez.

# Conflicts: # ds4.c # ds4_gpu.h # ds4_metal.m

audreyt · 2026-05-11T14:24:58Z

I cannot rebase this PR branch cleanly against current main (ae302c2).

It works for me — would you like to try audreyt:main?

audreyt · 2026-05-11T14:39:30Z

Apologies for the confusion — yes, my main is current, but gh pr checkout 60 && git rebase main isn't quite the right path to it, and that's on me for not being more explicit.

The PR branch head (147b0d7) already merges current origin/main (ae302c2, "Project renamed to DwarfStar 4."); the three merge commits in the PR history are where I resolved the conflicts from the recent backend refactor (ds4_metal.h → ds4_gpu.h, CUDA support landing). git rebase main discards those merges and replays the original 24022c2 (written against the pre-refactor layout) onto today's main, which reintroduces every conflict I already resolved.

So the branch as gh pr checkout 60 leaves it should just build:

gh pr checkout 60
make ds4-server

Thank you for taking the time to test this — really appreciate the careful repro.

fry69 · 2026-05-11T14:40:49Z

Ah, sorry for this confusion.

Merge commits always get me, I forget about them, sorry again.

audreyt · 2026-05-12T12:49:01Z

I do not know if it is a good idea to widen this PR, e.g. incorporating unrelated things like the Responses API or the count_tokens endpoint.

Indeed, thank you for the catch. Now factored out to #90 and #91 PRs.

audreyt · 2026-05-13T02:07:42Z

Closing — the motivating case (loading the cyberneurova abliterated stock-recipe Q2_K GGUF on main Metal) is now resolved at the model layer instead of the loader layer.

I've published https://huggingface.co/audreyt/CyberNeurova-DeepSeek-V4-Flash-abliterated-GGUF#audrey-tang-ds4-re-quants — antirez-style q2-imatrix and q4-imatrix re-quants of the cyberneurova abliterated weights, with F16 token embedding so the stock ds4 loader takes them directly, no support-q8_0-token-embd workaround needed. The recipe matches ds4flash.gguf byte-for-byte except for the abliterated routed experts.

This keeps the loader narrow (one canonical recipe, matching the README's "intentionally narrow" framing) and pushes behavioral surface — abliteration character, hedge register, etc. — onto the model and runtime layers (re-quants for the model side, dir-steering for the runtime side). Closing #60 makes that separation explicit and avoids carrying a 600+-line loader/Metal patch through every backend refactor (compressor APE, M5 simdgroup, MTL4) when the use case it addressed has evaporated.

audreyt/pi-ds4 has switched its download_model.sh to the new IQ2XXS-w2Q2K imatrix re-quant (~87 GB) and now runs end-to-end on stock main minus only ivanfioravanti's PR #15.

Big thanks to @fry69 for the careful reproductions on both the long-prompt indexer crash and the rebase-conflict reports — you kept this honest.

fry69 · 2026-05-13T05:30:07Z

I can confirm that his model works perfectly with current main branch:

$ cd gguf
$ hf download audreyt/CyberNeurova-DeepSeek-V4-Flash-abliterated-GGUF \
  cyberneurova-DeepSeek-V4-Flash-abliterated-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2-imatrix.gguf \
  --local-dir .
$ cd ..
$ rm -f tmp/ds4-kv/* && ./ds4-server --ctx 100000 --kv-disk-dir ./tmp/ds4-kv --kv-disk-space-mb 8192 --port 28000 -m ./gguf/cyberneurova-DeepSeek-V4-Flash-abliterated-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2-imatrix.gguf 
ds4: Metal device Apple M5 Max, 128.00 GiB RAM
ds4: requesting Metal residency (may take tens of seconds)... done
ds4: warming Metal model views... done
ds4: Metal model views created in 2.206 ms, residency requested in 8366.213 ms, warmup 3.422 ms (mapped 82697.67 MiB from offset 5.08 MiB)
ds4: Metal mapped mmaped model as 2 overlapping shared buffers
ds4: metal backend initialized for graph diagnostics
0513 07:18:45 ds4-server: context buffers 1896.58 MiB (ctx=100000, backend=metal, prefill_chunk=2048, raw_kv_rows=2304, compressed_kv_rows=25002)
0513 07:18:45 ds4-server: KV disk cache ./tmp/ds4-kv (budget=8192 MiB, cross-quant=accept, min=512, cold_max=30000, continued=10000, trim=32, align=2048)
0513 07:18:45 ds4-server: listening on http://127.0.0.1:28000
0513 07:19:12 ds4-server: chat ctx=0..1205:1205 prompt start
0513 07:19:16 ds4-server: chat ctx=0..1205:1205 prompt done 4.160s
0513 07:19:16 ds4-server: kv cache stored tokens=1205 trimmed=0 reason=cold size=38.68 MiB save=16.6 ms
0513 07:19:17 ds4-server: chat ctx=0..1205:1205 gen=50 THINKING decoding chunk=36.45 t/s avg=36.45 t/s 1.372s
[...]
0513 07:19:27 ds4-server: chat ctx=0..1205:1205 gen=400 THINKING decoding chunk=37.40 t/s avg=37.23 t/s 10.744s
0513 07:19:28 ds4-server: chat ctx=0..1205:1205 gen=450 decoding chunk=37.22 t/s avg=37.23 t/s 12.088s
[...]
0513 07:21:40 ds4-server: chat ctx=0..1205:1205 gen=4551 decoding chunk=28.88 t/s avg=31.52 t/s 144.368s
0513 07:21:40 ds4-server: thinking checkpoint canonicalization needs rebuild ctx=0..1205:1205 common=1204 live=5756 canonical=5345 reason="rewrite needs rebuild: common=1204 live=5756 canonical=5345"
0513 07:21:57 ds4-server: thinking checkpoint canonicalized ctx=0..1205:1205 common=1204 live=5756 canonical=5345 via=rebuild
0513 07:21:57 ds4-server: chat ctx=0..1205:1205 gen=4551 finish=stop 165.557s

audreyt added 2 commits May 10, 2026 06:22

audreyt marked this pull request as draft May 10, 2026 12:36

This comment was marked as resolved.

Sign in to view

audreyt marked this pull request as ready for review May 10, 2026 18:13

audreyt changed the title ~~feat(loader): support stock-recipe (Q8_0/F32) GGUFs end-to-end on Metal~~ feat(loader): support stock-recipe (Q8_0/F32) abliterated GGUFs end-to-end on Metal May 10, 2026

This comment was marked as resolved.

Sign in to view

audreyt and others added 2 commits May 10, 2026 14:44

Merge remote-tracking branch 'origin/main' into support-q8_0-token-embd

86aa248

audreyt mentioned this pull request May 10, 2026

Add Metal 4 M5 prefill optimizations #15

Draft

audreyt added 2 commits May 10, 2026 18:43

Merge remote-tracking branch 'origin/main' into support-q8_0-token-embd

a0f1a0b

Merge remote-tracking branch 'origin/main' into support-q8_0-token-embd

147b0d7

# Conflicts: # ds4.c # ds4_gpu.h # ds4_metal.m

This comment was marked as resolved.

Sign in to view

antirez added the weights loading or handling New feature or request label May 11, 2026

Merge remote-tracking branch 'origin/main' into support-q8_0-token-embd

e8f5bdc

This comment was marked as resolved.

Sign in to view

audreyt force-pushed the support-q8_0-token-embd branch from 63cb942 to e8f5bdc Compare May 12, 2026 12:48

audreyt closed this May 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(loader): support stock-recipe (Q8_0/F32) abliterated GGUFs end-to-end on Metal#60

feat(loader): support stock-recipe (Q8_0/F32) abliterated GGUFs end-to-end on Metal#60
audreyt wants to merge 8 commits into
antirez:mainfrom
audreyt:support-q8_0-token-embd

audreyt commented May 10, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

audreyt commented May 10, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

audreyt commented May 10, 2026

Uh oh!

This comment was marked as resolved.

audreyt commented May 11, 2026

Uh oh!

This comment was marked as resolved.

audreyt commented May 11, 2026

Uh oh!

fry69 commented May 11, 2026

Uh oh!

This comment was marked as resolved.

audreyt commented May 12, 2026

Uh oh!

audreyt commented May 13, 2026

Uh oh!

fry69 commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

audreyt commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this changes

Validators (ds4.c)

CPU paths (ds4.c)

Metal kernels

Metal wiring (ds4_metal.m)

Why CPU dequant for Q8_0 ape (and not a Metal kernel)

What this PR does not cover

Test matrix (macOS / M5/ Metal)

Uh oh!

This comment was marked as resolved.

audreyt commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

audreyt commented May 10, 2026

Uh oh!

This comment was marked as resolved.

audreyt commented May 11, 2026

Uh oh!

This comment was marked as resolved.

audreyt commented May 11, 2026

Uh oh!

fry69 commented May 11, 2026

Uh oh!

This comment was marked as resolved.

audreyt commented May 12, 2026

Uh oh!

audreyt commented May 13, 2026

Uh oh!

fry69 commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

audreyt commented May 10, 2026 •

edited

Loading

Validators (`ds4.c`)

CPU paths (`ds4.c`)

Metal wiring (`ds4_metal.m`)

audreyt commented May 10, 2026 •

edited

Loading