Skip to content

prepare chunk indices before cache initialize#4458

Open
grimoire wants to merge 2 commits intoInternLM:mainfrom
grimoire:prepare-chunk-indices
Open

prepare chunk indices before cache initialize#4458
grimoire wants to merge 2 commits intoInternLM:mainfrom
grimoire:prepare-chunk-indices

Conversation

@grimoire
Copy link
Collaborator

@grimoire grimoire commented Mar 24, 2026

Chunk gated delta kernel requires a chunk_indices, which requires stream synchronize.

This PR computes the chunk_indices before forward and cache initialization.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts the PyTorch engine’s prefill path for SSM / gated-delta (flash-linear-attention) models so that chunk-gated-delta “chunk indices” preparation (which forces a CUDA stream sync) happens during step-context construction, before state-cache initialization and forward execution.

Changes:

  • Move state-cache initialization for SSM from the input-update path into model_forward(), after build_context().
  • In the CUDA backend update_step_context(), eagerly call fla.ops.utils.prepare_chunk_indices(...) during prefill to trigger the required synchronization earlier.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
lmdeploy/pytorch/engine/model_agent/agent.py Moves SSM state cache initialization to occur after build_context() (and removes prior prefill-only init hook).
lmdeploy/pytorch/backends/cuda/op_backend.py Adds gated-delta chunk-index preparation during prefill step-context update.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator

@RunningLeon RunningLeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants