Refactor patches: Robust layer_id extraction and consistent return types to prevent Graph Breaks by lyj20071013 · Pull Request #3 · THUDM/IndexCache

lyj20071013 · 2026-03-14T09:06:46Z

Hi team,

Congratulations on the great work! I was studying the repository and noticed a few potential engineering and performance issues in both the vLLM and SGLang patches. This PR unifies the fixes across both frameworks.

1. Robustness of layer_id extraction
In the decoder and attention initializations, layer_id was parsed using a hardcoded string split (split(".")[-1]). If downstream module naming conventions change, this throws a ValueError.

Fix: Replaced with regex re.findall(r'\d+', prefix) to safely extract the layer index, and passed layer_id explicitly down to the attention modules.

2. Performance Impact of Dynamic Return Types
In the attention forward passes, the return signature was inconsistent (returning a single Tensor vs. a Tuple), forcing a dynamic type-check (isinstance) at the call site. This is a known cause for Graph Breaks in torch.compile and complicates CUDA Graph capture.

Fix: Standardized the return signatures to always be a Tuple (e.g., tuple[torch.Tensor, torch.Tensor | None]). This enabled static unpacking at the call sites, making the execution path compilation-friendly and avoiding graph fragmentation.

3. Cleaned up initialization debug prints.

(Note: I have applied these refactorings consistently across both indexcache_vllm.patch and indexcache.patch. I verified the logic locally, but as I'm currently on a Windows environment without a full CUDA build, please feel free to double-check if this applies cleanly in your distributed compile tests.)

Thanks!

Add files via upload

bc7b710

lyj20071013 marked this pull request as draft March 17, 2026 02:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor patches: Robust layer_id extraction and consistent return types to prevent Graph Breaks#3

Refactor patches: Robust layer_id extraction and consistent return types to prevent Graph Breaks#3
lyj20071013 wants to merge 1 commit into
THUDM:mainfrom
lyj20071013:main

lyj20071013 commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

lyj20071013 commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant