prepare chunk indices before cache initialize by grimoire · Pull Request #4458 · InternLM/lmdeploy

grimoire · 2026-03-24T13:16:27Z

Chunk gated delta kernel requires a chunk_indices, which requires stream synchronize.

This PR computes the chunk_indices before forward and cache initialization.

Copilot

Pull request overview

This PR adjusts the PyTorch engine’s prefill path for SSM / gated-delta (flash-linear-attention) models so that chunk-gated-delta “chunk indices” preparation (which forces a CUDA stream sync) happens during step-context construction, before state-cache initialization and forward execution.

Changes:

Move state-cache initialization for SSM from the input-update path into model_forward(), after build_context().
In the CUDA backend update_step_context(), eagerly call fla.ops.utils.prepare_chunk_indices(...) during prefill to trigger the required synchronization earlier.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
`lmdeploy/pytorch/engine/model_agent/agent.py`	Moves SSM state cache initialization to occur after `build_context()` (and removes prior prefill-only init hook).
`lmdeploy/pytorch/backends/cuda/op_backend.py`	Adds gated-delta chunk-index preparation during prefill step-context update.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lmdeploy/pytorch/backends/cuda/op_backend.py

RunningLeon

LGTM

prepare chunk indices before cache initialize

d7ceaf4

lvhan028 added the improvement label Mar 25, 2026

lvhan028 requested review from CUHKSZzxy, RunningLeon and Copilot March 25, 2026 02:53

Copilot started reviewing on behalf of lvhan028 March 25, 2026 02:53 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

solve comment;add flag

2205796

RunningLeon approved these changes Mar 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prepare chunk indices before cache initialize#4458

prepare chunk indices before cache initialize#4458
grimoire wants to merge 2 commits intoInternLM:mainfrom
grimoire:prepare-chunk-indices

grimoire commented Mar 24, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RunningLeon left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

grimoire commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RunningLeon left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

grimoire commented Mar 24, 2026 •

edited

Loading