Skip to content

docs: scope the OpenXLA SSM / hybrid / recurrent track (#502)#565

Merged
inureyes merged 1 commit into
mainfrom
feature/issue-502-ssm-hybrid-recurrent-design
Jul 1, 2026
Merged

docs: scope the OpenXLA SSM / hybrid / recurrent track (#502)#565
inureyes merged 1 commit into
mainfrom
feature/issue-502-ssm-hybrid-recurrent-design

Conversation

@inureyes

@inureyes inureyes commented Jul 1, 2026

Copy link
Copy Markdown
Member

Summary

Scoping design for Window D of epic #493: the SSM / hybrid / recurrent architecture track on the OpenXLA/IREE backend (Mamba/Mamba2, Jamba, Falcon-H1, RWKV7, Qwen3-Next, Plamo2, Nemotron-H, Kimi-Linear, RecurrentGemma). Design-doc only; no code changes.

What changed

  • Add spike/openxla/SSM_HYBRID_DESIGN.md, matching the STAGE2_DESIGN.md convention. It grounds the track in the current codebase and covers the three deliverables the issue asks for:
    • The recurrent/state-space mixing primitives (selective scan / SSD, gated linear attention / delta rule, real-gated linear recurrent unit) and why they do not fit the shared attention core AttnLayout.
    • Explicit per-slot recurrent STATE allocation and carry through the continuous-batching engine, distinct from the KV cache: fixed-size (O(1), not growing with position), threaded as parallel device buffers via the same shim mechanism the KV cache already uses (kcache_b / vcache_b, xla_ensure_batch_kv).
    • Coexistence with KV-cache attention layers in hybrid models via a per-layer mixer dispatch in emit_transformer_layer, with a slot carrying both a KV region and a state region.
  • Records an explicit decision to DEFER the implementation, with a staged plan.

Design decision

Deferred to a dedicated follow-up epic, #564, per the second option of the issue acceptance criteria. A first reference (Mamba2) requires a new scan graph primitive (whose prefill form is an unsolved lowering question), a new C-ABI per-slot state resource, a new per-layer mixer dispatch, and a new state-aware validation harness, comparable in size to the whole Stage 2 continuous-batching effort.

Test plan

  • Documentation-only change; no source, build, or test surface touched.
  • File references in the doc (AttnLayout, emit_ffn_body, moe_block, XlaBatchEngine, IreeRaggedLlama, xla_ensure_batch_kv, hybrid_ssm.rs, Mamba2Cache, kv_arch.rs) verified against the current tree.
  • No AI attribution and no em dashes.

Closes #502

Add spike/openxla/SSM_HYBRID_DESIGN.md, the Window D scoping design for epic #493. It grounds the recurrent/state-space track in the current OpenXLA/IREE backend: the recurrent/state-space mixing primitives (selective scan, linear recurrence, delta rule) that do not fit the shared attention core, the per-slot recurrent state allocation and carry through the continuous-batching engine (distinct from the KV cache, fixed-size, threaded as parallel device buffers through the same shim mechanism the KV cache uses), and the coexistence with KV-cache attention layers in hybrid models.

Records an explicit decision to defer the implementation to the follow-up epic #564, per the second option of the issue acceptance criteria, with a staged plan (scan spike, primitive, state carry, Mamba2 reference, Nemotron-H hybrid, then breadth).
@inureyes inureyes added type:docs Documentation improvements or additions priority:low Low priority area:architecture Architecture and code structure changes area:docs User and developer documentation status:review Under review labels Jul 1, 2026
@inureyes inureyes merged commit a93bb6b into main Jul 1, 2026
5 checks passed
@inureyes inureyes deleted the feature/issue-502-ssm-hybrid-recurrent-design branch July 1, 2026 05:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:architecture Architecture and code structure changes area:docs User and developer documentation priority:low Low priority status:review Under review type:docs Documentation improvements or additions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

design: scope the SSM / hybrid / recurrent track (Mamba2, Jamba, RWKV7, Qwen3-Next, and more)

1 participant