Skip to content

fix: proper XGrammar integration for guided decoding#4726

Open
windreamer wants to merge 1 commit into
InternLM:mainfrom
windreamer:fix-xgrammar-integration
Open

fix: proper XGrammar integration for guided decoding#4726
windreamer wants to merge 1 commit into
InternLM:mainfrom
windreamer:fix-xgrammar-integration

Conversation

@windreamer

@windreamer windreamer commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily receiving feedbacks. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Due to xgrammar upgrade, lmdeploy unit tests are failing. This PR fixes the XGrammar integration issues to ensure compatibility with the latest xgrammar version.

Modification

This PR makes the following modifications to properly integrate with XGrammar:

1. vocab_size expansion

  • Expand vocab_size to len(tokenizer) to include all token IDs (EOS, special tokens)
  • Some models have vocab_size < len(tokenizer), causing EOS tokens to be out of bitmask range
  • Added logic to detect and expand vocab_size when necessary

2. Remove terminate_without_stop_token parameter

  • Removed terminate_without_stop_token=True parameter from GrammarMatcher initialization
  • XGrammar now automatically detects stop tokens from the tokenizer

3. Add is_terminated checks

  • Added is_terminated() method to GuidedDecodingManager
  • Added is_terminated() check before fill_bitmap() to prevent operations on terminated processors
  • Added is_terminated() check before accept_token() to safely handle terminated processors

BC-breaking (Optional)

None. This change maintains backward compatibility while fixing integration issues with the latest xgrammar version.

Checklist

  1. Pre-commit or other linting tools are used to fix the potential lint issues. ✓
  2. The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. (Tests exist in test_grammar.py)
  3. If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
  4. The documentation has been modified accordingly, like docstring or example tutorials.

Note: This fix is cherry-picked from commit 2b118c5 on the mtp-guided branch, adapted for the main branch.

Copilot AI review requested due to automatic review settings July 2, 2026 07:09

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the PyTorch guided decoding integration to stay compatible with newer xgrammar behavior, focusing on correct vocabulary sizing, termination handling, and stop-token detection.

Changes:

  • Expand vocab_size to len(tokenizer) when needed to ensure the guided bitmask covers EOS/special tokens.
  • Remove terminate_without_stop_token=True from xgr.GrammarMatcher construction (stop tokens are now auto-detected by XGrammar).
  • Add termination checks to avoid calling fill_next_token_bitmask / accept_token on already-terminated matchers.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
lmdeploy/pytorch/engine/guided_process.py Adjusts XGrammar tokenizer/vocab sizing, matcher construction, and adds termination-aware bitmap filling.
lmdeploy/pytorch/engine/logits_process.py Skips accept_token updates for terminated guided processors during logits processing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lmdeploy/pytorch/engine/guided_process.py Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Comment thread lmdeploy/pytorch/engine/logits_process.py
Comment thread lmdeploy/pytorch/engine/guided_process.py
Comment thread lmdeploy/pytorch/engine/guided_process.py
@windreamer windreamer force-pushed the fix-xgrammar-integration branch from 8d61f88 to c7f9502 Compare July 2, 2026 08:43
- Expand vocab_size to len(tokenizer) to include all token IDs (EOS, special tokens)
- Remove terminate_without_stop_token parameter (XGrammar auto-detects stop tokens)
- Add is_terminated() check before fill_bitmap and accept_token to handle terminated processors

This fix addresses issues with XGrammar integration after the xgrammar upgrade,
where some models have vocab_size < len(tokenizer), causing EOS tokens to be
out of bitmask range.

Cherry-picked from commit 2b118c5 on mtp-guided branch.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Comment thread lmdeploy/pytorch/engine/logits_process.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants