[None][feat] Reduce sampler overhead with min_tokens by galagam · Pull Request #13480 · NVIDIA/TensorRT-LLM

galagam · 2026-04-26T19:52:27Z

Summary by CodeRabbit

Release Notes

Refactor
- Optimized token sampling performance in text generation. The min-length penalty application now uses more efficient batch processing for faster generation speeds.

Description

move allocation outside of loop
set dtype to match logits and avoid implicit cast in each iteration
per-request D2D copies (in loop) => 2 per-batch H2D copies (for the indices) + 1 per-batch GPU kernel (index_put_)

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

- move allocation outside of loop - set dtype to match logits and avoid implicit cast in each iteration Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>

…-batch copy per-request D2D copies (in loop) => 2 per-batch H2D copies (for the indices) + 1 per-batch GPU kernel (index_put_) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>

coderabbitai · 2026-04-27T14:04:11Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 38e8ff7a-c64c-4395-abe8-3efd8d0dfd71

📥 Commits

Reviewing files that changed from the base of the PR and between dd907c0 and 745dd2f.

📒 Files selected for processing (1)

tensorrt_llm/_torch/pyexecutor/sampler.py

📝 Walkthrough

Walkthrough

The _apply_min_length_penalty function in the sampler is optimized to use batched indexed tensor operations instead of per-position in-place logits assignment. The refactored approach accumulates affected indices and performs a single index_put_ operation, with early return when no penalty is needed.

Changes

Cohort / File(s)	Summary
Min-length Penalty Optimization `tensorrt_llm/_torch/pyexecutor/sampler.py`	Refactored `_apply_min_length_penalty` to replace nested Python loops with batched indexed updates using `index_put_`, improving performance through vectorized tensor operations instead of individual element assignments.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	The PR description provides specific technical details about the optimization (moving allocations, dtype matching, replacing D2D copies with H2D copies and GPU kernel), but the Test Coverage section is empty and incomplete.	Add test coverage details explaining which existing tests validate the changes and whether new tests were added to ensure the optimization maintains correctness.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main change: reducing sampler overhead related to min_tokens functionality through optimization of the min-length penalty application logic.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

galagam · 2026-04-27T15:50:32Z

/bot run

tensorrt-cicd · 2026-04-27T15:58:26Z

PR_Github #45748 [ run ] triggered by Bot. Commit: 745dd2f Link to invocation

galagam added 2 commits April 26, 2026 12:36

[None][feat] Reduce sampler overhead with min_tokens

5036000

- move allocation outside of loop - set dtype to match logits and avoid implicit cast in each iteration Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>

[None][feat] Speculative fix: replace per-request D2D copy with a per…

745dd2f

…-batch copy per-request D2D copies (in loop) => 2 per-batch H2D copies (for the indices) + 1 per-batch GPU kernel (index_put_) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>

galagam self-assigned this Apr 26, 2026

galagam marked this pull request as ready for review April 27, 2026 13:59

galagam requested a review from a team as a code owner April 27, 2026 13:59

galagam requested a review from achartier April 27, 2026 13:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][feat] Reduce sampler overhead with min_tokens#13480

[None][feat] Reduce sampler overhead with min_tokens#13480
galagam wants to merge 2 commits intoNVIDIA:mainfrom
nv-auto-deploy:dev-gagam-sampler-min-tokens-fix

galagam commented Apr 26, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 27, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 inconclusive)

Uh oh!

galagam commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

galagam commented Apr 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented Apr 27, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 inconclusive)

Uh oh!

galagam commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

galagam commented Apr 26, 2026 •

edited by coderabbitai Bot

Loading