DMS: Reference kvcompress kernels in the README by alancucki · Pull Request #986 · NVIDIA/Model-Optimizer

alancucki · 2026-03-05T23:39:58Z

What does this PR do?

Adds a mention of kvcompress in the DMS's README. See NVIDIA/kvpress#184 for context.

Summary by CodeRabbit

Documentation
- Added "Related projects" section to README with links to related Triton kernels and resources for enhanced discovery of relevant tools.

Signed-off-by: Adrian Lancucki <alancucki@nvidia.com>

copy-pr-bot · 2026-03-05T23:40:02Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-03-05T23:40:16Z

📝 Walkthrough

Walkthrough

A new "Related projects" subsection is added to the experimental/dms/README.md file in two locations, each containing a bullet point referencing Steve Westerhouse's Triton kernels for DMS with descriptive text and a link, without modifying existing documentation content.

Changes

Cohort / File(s)	Summary
Documentation `experimental/dms/README.md`	Added duplicate "Related projects" sections with links to Triton kernels for DMS, highlighting performance benefits of faster prefilling.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title references 'kvcompress kernels' but the summary indicates the changes add a 'Related projects' section referencing Steve Westerhouse's Triton kernels for DMS, not kvcompress kernels specifically.	Update the title to accurately reflect that it adds a 'Related projects' section, or clarify if kvcompress is the specific project being referenced.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Security Anti-Patterns	✅ Passed	No Python code changes present; only markdown documentation modified. Security anti-patterns checks are not applicable.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@experimental/dms/README.md`:
- Line 117: Replace the phrase "speed up" with the single word "speedup" in the
README sentence that references Steve Westerhouse's Triton kernels for DMS (the
line mentioning "kvcompress" and "dms_prefill_flex()") so the sentence reads
"...Although the speedup is lower than with `dms_prefill_flex()`..., it's a
great example..." to maintain technical wording consistency.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 09d796f2-e048-4ef0-9c1e-7bce57af6bc2

📥 Commits

Reviewing files that changed from the base of the PR and between 31f0783 and ecd56b0.

📒 Files selected for processing (1)

experimental/dms/README.md

coderabbitai · 2026-03-05T23:41:45Z

experimental/dms/README.md


+## Related projects
+
+ - [Steve Westerhouse's Triton kernels for DMS](https://github.com/westers/kvcompress): Triton kernels for faster prefilling along with benchmarking scripts. Although the speed up is lower than with `dms_prefill_flex()` ([results](https://github.com/westers/kvcompress?tab=readme-ov-file#dms-prefill-optimization-results)), it's a great example of a clean implementation of FlashAttention with DMS.


⚠️ Potential issue | 🟡 Minor

Use “speedup” instead of “speed up” in this sentence.

Minor wording fix for technical writing consistency.

✏️ Suggested edit

- - [Steve Westerhouse's Triton kernels for DMS](https://github.com/westers/kvcompress): Triton kernels for faster prefilling along with benchmarking scripts. Although the speed up is lower than with `dms_prefill_flex()` ([results](https://github.com/westers/kvcompress?tab=readme-ov-file#dms-prefill-optimization-results)), it's a great example of a clean implementation of FlashAttention with DMS. + - [Steve Westerhouse's Triton kernels for DMS](https://github.com/westers/kvcompress): Triton kernels for faster prefilling along with benchmarking scripts. Although the speedup is lower than with `dms_prefill_flex()` ([results](https://github.com/westers/kvcompress?tab=readme-ov-file#dms-prefill-optimization-results)), it's a great example of a clean implementation of FlashAttention with DMS.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- [Steve Westerhouse's Triton kernels for DMS](https://github.com/westers/kvcompress): Triton kernels for faster prefilling along with benchmarking scripts. Although the speed up is lower than with `dms_prefill_flex()` ([results](https://github.com/westers/kvcompress?tab=readme-ov-file#dms-prefill-optimization-results)), it's a great example of a clean implementation of FlashAttention with DMS.

- [Steve Westerhouse's Triton kernels for DMS](https://github.com/westers/kvcompress): Triton kernels for faster prefilling along with benchmarking scripts. Although the speedup is lower than with `dms_prefill_flex()` ([results](https://github.com/westers/kvcompress?tab=readme-ov-file#dms-prefill-optimization-results)), it's a great example of a clean implementation of FlashAttention with DMS.

🧰 Tools

🪛 LanguageTool

[grammar] ~117-~117: Ensure spelling is correct
Context: ...with benchmarking scripts. Although the speed up is lower than with dms_prefill_flex()...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@experimental/dms/README.md` at line 117, Replace the phrase "speed up" with the single word "speedup" in the README sentence that references Steve Westerhouse's Triton kernels for DMS (the line mentioning "kvcompress" and "dms_prefill_flex()") so the sentence reads "...Although the speedup is lower than with `dms_prefill_flex()`..., it's a great example..." to maintain technical wording consistency.

Reference kvcompress DMS kernels in the README

ecd56b0

Signed-off-by: Adrian Lancucki <alancucki@nvidia.com>

coderabbitai bot reviewed Mar 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DMS: Reference kvcompress kernels in the README#986

DMS: Reference kvcompress kernels in the README#986
alancucki wants to merge 1 commit intoNVIDIA:mainfrom
alancucki:dms_readme_update

alancucki commented Mar 5, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Mar 5, 2026

Uh oh!

coderabbitai bot commented Mar 5, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		## Related projects

		- [Steve Westerhouse's Triton kernels for DMS](https://github.com/westers/kvcompress): Triton kernels for faster prefilling along with benchmarking scripts. Although the speed up is lower than with `dms_prefill_flex()` ([results](https://github.com/westers/kvcompress?tab=readme-ov-file#dms-prefill-optimization-results)), it's a great example of a clean implementation of FlashAttention with DMS.

Conversation

alancucki commented Mar 5, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Mar 5, 2026

Uh oh!

coderabbitai bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

alancucki commented Mar 5, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 5, 2026 •

edited

Loading