Skip to content

kv-cache: SWA checkpoints store only non-masked cells (cherry-pick #23981)#27

Open
lalalune wants to merge 2 commits into
mainfrom
eliza/gemma-kv-swa-checkpoint-fix
Open

kv-cache: SWA checkpoints store only non-masked cells (cherry-pick #23981)#27
lalalune wants to merge 2 commits into
mainfrom
eliza/gemma-kv-swa-checkpoint-fix

Conversation

@lalalune

Copy link
Copy Markdown
Member

Cherry-picks upstream ggml-org/llama.cpp#23981 (commit 2365315) into the eliza fork.

Why: Gemma 4 is notorious for KV-checkpoint RAM blow-up (upstream #21690 OOM). This makes llama_kv_cache::state_write skip SWA-masked cells, shrinking every spec-decode rollback checkpoint (our FFI common_prompt_checkpoint → llama_state_seq_get_data_ext → state_write path). Directly relevant to the eliza-1 Gemma 4 cutover (#9033 in elizaOS/eliza).

Verified: cherry-pick clean; CPU rebuild green (build 10027); Gemma 4 E2B (Q8_0) still runs (llama-bench pp64/tg32 nominal). Deps (is_masked_swa/n_swa/swa_type) already present in the fork.

🤖 Generated with Claude Code

ggerganov and others added 2 commits June 22, 2026 12:26
…eRT/MLX scaffolds (M4/M5)

Lets the one streaming-LLM FFI pipe (eliza_inference_llm_stream_*) be served
by more than one in-process runtime, selected per-_open, without touching the
default llama.cpp path. Realizes M3 of the Gemma 4 cutover and lands the
device-gated M4/M5 backends on top.

M3 seam (always compiled, inert by default):
- src/llm-backend.h — LlmBackendSession / LlmBackendFactory pure-virtual
  interfaces mirroring the FFI 1:1, plus llm_backend_context_bundle_dir(ctx),
  the one accessor a backend uses to read the bundle root from the otherwise
  opaque EliInferenceContext (no can_serve->open bundle-dir caching).
- src/llm-backend-selector.cpp — idempotent registry + selection: ELIZA_LLM_BACKEND
  env hard-select, else highest preference_rank among available()+can_serve();
  nullptr+no-error => keep in-tree llama.cpp. With no -DELIZA_ENABLE_* gate, no
  backend registers, so select() always returns nullptr.
- eliza-inference-ffi.cpp — one `if (stream->backend) return stream->backend->X()`
  branch inserted ABOVE each existing llama.cpp/MTP branch in open/prefill/next/
  cancel/reset/reset_keep/save_slot/restore_slot/close. Device-critical path
  untouched, just guarded.

M4 LiteRT-LM (gate -DELIZA_ENABLE_LITERT, OFF): src/backends/litert-backend.{h,cpp}
  — Engine/Session against the researched LiteRT-LM C++ API, NPU->GPU->CPU ladder,
  text/*.litertlm probe; no-SDK stub when OFF.

M5 CoreML/MLX (gate -DELIZA_ENABLE_MLX + __APPLE__, OFF; FATALs on non-Apple):
  src/backends/mlx-coreml-backend.{h,mm} — MLX-primary (mlx-c decode graph) +
  CoreML-alternate (stateful MLState KV); no-SDK stub when OFF.

CMake: selector folded into OMNIVOICE_FFI_SOURCES (always built); the two
accelerator backends gated with SDK include/link knobs. Default fused build
verified on Linux: libelizainference.so links, the FFI pipe stays exported,
litert_/mlx_coreml_backend_factory absent (gates OFF) — byte-for-byte the prior
llama.cpp path. Every hardware assumption tagged DEVICE-VERIFY.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 67bde25a-18bf-46d0-b7a4-ae8a3d848cc2

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch eliza/gemma-kv-swa-checkpoint-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants