kv-cache: SWA checkpoints store only non-masked cells (cherry-pick #23981) by lalalune · Pull Request #27 · elizaOS/llama.cpp

lalalune · 2026-06-22T19:28:09Z

Cherry-picks upstream ggml-org/llama.cpp#23981 (commit 2365315) into the eliza fork.

Why: Gemma 4 is notorious for KV-checkpoint RAM blow-up (upstream #21690 OOM). This makes llama_kv_cache::state_write skip SWA-masked cells, shrinking every spec-decode rollback checkpoint (our FFI common_prompt_checkpoint → llama_state_seq_get_data_ext → state_write path). Directly relevant to the eliza-1 Gemma 4 cutover (#9033 in elizaOS/eliza).

Verified: cherry-pick clean; CPU rebuild green (build 10027); Gemma 4 E2B (Q8_0) still runs (llama-bench pp64/tg32 nominal). Deps (is_masked_swa/n_swa/swa_type) already present in the fork.

🤖 Generated with Claude Code

…eRT/MLX scaffolds (M4/M5) Lets the one streaming-LLM FFI pipe (eliza_inference_llm_stream_*) be served by more than one in-process runtime, selected per-_open, without touching the default llama.cpp path. Realizes M3 of the Gemma 4 cutover and lands the device-gated M4/M5 backends on top. M3 seam (always compiled, inert by default): - src/llm-backend.h — LlmBackendSession / LlmBackendFactory pure-virtual interfaces mirroring the FFI 1:1, plus llm_backend_context_bundle_dir(ctx), the one accessor a backend uses to read the bundle root from the otherwise opaque EliInferenceContext (no can_serve->open bundle-dir caching). - src/llm-backend-selector.cpp — idempotent registry + selection: ELIZA_LLM_BACKEND env hard-select, else highest preference_rank among available()+can_serve(); nullptr+no-error => keep in-tree llama.cpp. With no -DELIZA_ENABLE_* gate, no backend registers, so select() always returns nullptr. - eliza-inference-ffi.cpp — one `if (stream->backend) return stream->backend->X()` branch inserted ABOVE each existing llama.cpp/MTP branch in open/prefill/next/ cancel/reset/reset_keep/save_slot/restore_slot/close. Device-critical path untouched, just guarded. M4 LiteRT-LM (gate -DELIZA_ENABLE_LITERT, OFF): src/backends/litert-backend.{h,cpp} — Engine/Session against the researched LiteRT-LM C++ API, NPU->GPU->CPU ladder, text/*.litertlm probe; no-SDK stub when OFF. M5 CoreML/MLX (gate -DELIZA_ENABLE_MLX + __APPLE__, OFF; FATALs on non-Apple): src/backends/mlx-coreml-backend.{h,mm} — MLX-primary (mlx-c decode graph) + CoreML-alternate (stateful MLState KV); no-SDK stub when OFF. CMake: selector folded into OMNIVOICE_FFI_SOURCES (always built); the two accelerator backends gated with SDK include/link knobs. Default fused build verified on Linux: libelizainference.so links, the FFI pipe stays exported, litert_/mlx_coreml_backend_factory absent (gates OFF) — byte-for-byte the prior llama.cpp path. Every hardware assumption tagged DEVICE-VERIFY. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-22T19:28:17Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 67bde25a-18bf-46d0-b7a4-ae8a3d848cc2

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch eliza/gemma-kv-swa-checkpoint-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ggerganov and others added 2 commits June 22, 2026 12:26

kv-cache : SWA checkpoints store only non-masked cells (#23981)

3e81729

github-actions Bot added the examples label Jun 22, 2026

lalalune mentioned this pull request Jun 22, 2026

Sync fork to upstream master (+604 commits) — Gemma4 verified #29

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv-cache: SWA checkpoints store only non-masked cells (cherry-pick #23981)#27

kv-cache: SWA checkpoints store only non-masked cells (cherry-pick #23981)#27
lalalune wants to merge 2 commits into
mainfrom
eliza/gemma-kv-swa-checkpoint-fix

lalalune commented Jun 22, 2026

Uh oh!

coderabbitai Bot commented Jun 22, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lalalune commented Jun 22, 2026

Uh oh!

coderabbitai Bot commented Jun 22, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants