DMS: Reference kvcompress kernels in the README#986
DMS: Reference kvcompress kernels in the README#986alancucki wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
Signed-off-by: Adrian Lancucki <alancucki@nvidia.com>
📝 WalkthroughWalkthroughA new "Related projects" subsection is added to the experimental/dms/README.md file in two locations, each containing a bullet point referencing Steve Westerhouse's Triton kernels for DMS with descriptive text and a link, without modifying existing documentation content. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@experimental/dms/README.md`:
- Line 117: Replace the phrase "speed up" with the single word "speedup" in the
README sentence that references Steve Westerhouse's Triton kernels for DMS (the
line mentioning "kvcompress" and "dms_prefill_flex()") so the sentence reads
"...Although the speedup is lower than with `dms_prefill_flex()`..., it's a
great example..." to maintain technical wording consistency.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 09d796f2-e048-4ef0-9c1e-7bce57af6bc2
📒 Files selected for processing (1)
experimental/dms/README.md
|
|
||
| ## Related projects | ||
|
|
||
| - [Steve Westerhouse's Triton kernels for DMS](https://github.com/westers/kvcompress): Triton kernels for faster prefilling along with benchmarking scripts. Although the speed up is lower than with `dms_prefill_flex()` ([results](https://github.com/westers/kvcompress?tab=readme-ov-file#dms-prefill-optimization-results)), it's a great example of a clean implementation of FlashAttention with DMS. |
There was a problem hiding this comment.
Use “speedup” instead of “speed up” in this sentence.
Minor wording fix for technical writing consistency.
✏️ Suggested edit
- - [Steve Westerhouse's Triton kernels for DMS](https://github.com/westers/kvcompress): Triton kernels for faster prefilling along with benchmarking scripts. Although the speed up is lower than with `dms_prefill_flex()` ([results](https://github.com/westers/kvcompress?tab=readme-ov-file#dms-prefill-optimization-results)), it's a great example of a clean implementation of FlashAttention with DMS.
+ - [Steve Westerhouse's Triton kernels for DMS](https://github.com/westers/kvcompress): Triton kernels for faster prefilling along with benchmarking scripts. Although the speedup is lower than with `dms_prefill_flex()` ([results](https://github.com/westers/kvcompress?tab=readme-ov-file#dms-prefill-optimization-results)), it's a great example of a clean implementation of FlashAttention with DMS.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - [Steve Westerhouse's Triton kernels for DMS](https://github.com/westers/kvcompress): Triton kernels for faster prefilling along with benchmarking scripts. Although the speed up is lower than with `dms_prefill_flex()` ([results](https://github.com/westers/kvcompress?tab=readme-ov-file#dms-prefill-optimization-results)), it's a great example of a clean implementation of FlashAttention with DMS. | |
| - [Steve Westerhouse's Triton kernels for DMS](https://github.com/westers/kvcompress): Triton kernels for faster prefilling along with benchmarking scripts. Although the speedup is lower than with `dms_prefill_flex()` ([results](https://github.com/westers/kvcompress?tab=readme-ov-file#dms-prefill-optimization-results)), it's a great example of a clean implementation of FlashAttention with DMS. |
🧰 Tools
🪛 LanguageTool
[grammar] ~117-~117: Ensure spelling is correct
Context: ...with benchmarking scripts. Although the speed up is lower than with dms_prefill_flex()...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@experimental/dms/README.md` at line 117, Replace the phrase "speed up" with
the single word "speedup" in the README sentence that references Steve
Westerhouse's Triton kernels for DMS (the line mentioning "kvcompress" and
"dms_prefill_flex()") so the sentence reads "...Although the speedup is lower
than with `dms_prefill_flex()`..., it's a great example..." to maintain
technical wording consistency.
What does this PR do?
Adds a mention of kvcompress in the DMS's README. See NVIDIA/kvpress#184 for context.
Summary by CodeRabbit