[None][perf] Drop cubin and Eliminate ~6s FMHA JIT recompile in eager generation by aligning kernel selection with CUDA graph warmup by yunruis · Pull Request #13505 · NVIDIA/TensorRT-LLM

yunruis · 2026-04-27T11:13:41Z

Summary by CodeRabbit

Release Notes

Bug Fixes
- Fixed attention window parameter handling during multi-turn generation in MLA scenarios.
Performance
- Updated kernel implementations for optimized attention computations on SM100/SM100f architectures.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>

…rnel cached, avoid JIT on runtime Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>

Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>

yunruis · 2026-04-27T11:15:01Z

/bot run --disable-fail-fast

yunruis · 2026-04-27T11:17:51Z

/bot kill

yunruis · 2026-04-27T11:18:26Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-27T11:21:32Z

PR_Github #45724 [ run ] triggered by Bot. Commit: 2567c94 Link to invocation

tensorrt-cicd · 2026-04-27T11:24:17Z

PR_Github #45725 [ run ] triggered by Bot. Commit: 2567c94 Link to invocation

tensorrt-cicd · 2026-04-27T11:24:33Z

PR_Github #45724 [ run ] completed with state ABORTED. Commit: 2567c94

Link to invocation

tensorrt-cicd · 2026-04-27T11:25:11Z

PR_Github #45726 [ kill ] triggered by Bot. Commit: 2567c94 Link to invocation

tensorrt-cicd · 2026-04-27T11:25:15Z

PR_Github #45725 [ run ] completed with state ABORTED. Commit: 2567c94

Link to invocation

tensorrt-cicd · 2026-04-27T11:25:46Z

PR_Github #45726 [ kill ] completed with state SUCCESS. Commit: 2567c94
Successfully killed previous jobs for commit 2567c94

Link to invocation

coderabbitai · 2026-04-27T11:25:59Z

📝 Walkthrough

Walkthrough

The PR modifies MLA generation parameter handling in the attention operation dispatcher, changing the KV-cache maximum sequence length source from max_past_kv_length to max_attention_window_size with a TODO for future SWA alignment. Additionally, all FMHA kernel binaries (cubin files) are updated to newer compiled versions.

Changes

Cohort / File(s)	Summary
MLA Generation Parameter Handling `cpp/tensorrt_llm/common/attentionOp.cpp`	Changes `tllmRunnerParams.mMaxSeqLenKv` assignment during MLA generation from `generation_params.max_past_kv_length` to `generation_params.max_attention_window_size` with inline rationale comment and TODO for future SWA logic mirroring.
FMHA Kernel Cubin LFS Pointer Updates (SM100a/SM100f) `cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100*_cubin.cpp` (770+ files)	Updates Git LFS OID (SHA-256 hash) references for precompiled FMHA kernel binaries across multiple configurations (varying head dimensions, sequence lengths, kernel types). File sizes remain unchanged; only binary artifact references are updated.
FMHA Kernel Cubin LFS Pointer Deletions `cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100fKernel_QkvBfloat16OBfloat16H128PagedKv*..._cubin.cpp` (18 files)	Removes Git LFS pointer metadata for specific QKV kernel variants (with MultiCtasKvCga/MultiCtasKvVarSeq/SkipsSoftmax/SwapsAbForGen suffixes), eliminating stored references to those precompiled binaries.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

[TRTLLM-6674][feat] (Breaking Change) Hopper SWA non-cyclic kernels + KV reuse + Spec Dec #6379: Also modifies cpp/tensorrt_llm/common/attentionOp.cpp to handle KV-cache/sliding-window parameters and invocation postprocessing for speculative decoding scenarios.
[None][feat] Trtllm-gen FMHA JIT support #12612: Modifies FMHA runner parameter population including mMaxSeqLenKv assignments; directly related parameter handling in the same code path.

Suggested labels

VisualGen

Suggested reviewers

yuxianq
niukuo
PerkzZheng

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is incomplete; it contains only a template with no actual explanatory content filled in. Required sections like Description, Test Coverage, and proper PR checklist details are missing.	Fill in the Description section explaining the issue and solution. Add Test Coverage section listing relevant tests. Ensure all PR Checklist items are properly reviewed and documented.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly summarizes the main change: dropping cubins and eliminating FMHA JIT recompile overhead during eager generation by aligning kernel selection with CUDA graph warmup.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (2)

cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128SeparateQkvDenseVarSeqSkipsSoftmaxQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp (1)
1-3: Exclude Git LFS pointer artifacts from C++ static-analysis/compile targets.

This file is a valid Git LFS pointer, so clang parsing errors here are tooling noise and can hide real diagnostics in actual source files. Please filter cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/*_cubin.cpp (when unresolved as LFS pointers) from C++ analysis inputs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128SeparateQkvDenseVarSeqSkipsSoftmaxQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp`
around lines 1 - 3, The file under
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/*_cubin.cpp is a Git LFS
pointer (starts with "version https://git-lfs.github.com/spec/v1") and should be
excluded from C++ parsing/compile analysis; modify the C++ analysis inputs
(e.g., the build/CI config, clang-tidy/clangd file list, or the source discovery
logic) to detect and skip files that begin with the LFS pointer header and/or
match the glob cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/*_cubin.cpp
when unresolved as LFS pointers so they are not passed to the compiler or static
analyzers.
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128SeparateQkvDenseVarSeqSkipsSoftmaxQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp (1)
1-3: Consider excluding LFS cubin pointer files from C++ compilation/static-analysis inputs.

The reported clang errors here are consistent with parsing Git LFS pointer text as C++; filtering this path pattern in clang-based jobs would reduce noise.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128SeparateQkvDenseVarSeqSkipsSoftmaxQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp`
around lines 1 - 3, The CI/clang static-analysis is parsing Git LFS pointer
files (e.g. the file named
FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128SeparateQkvDenseVarSeqSkipsSoftmaxQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp)
as C++ source; update the clang-based job or compile-command generator to filter
out LFS pointer/cubin artifacts by skipping files that match the cubin pointer
pattern (e.g. *.cubin.cpp or path/glob containing "trtllmGenKernels/*/cubin") or
by detecting files whose first line starts with "version
https://git-lfs.github.com/spec/v1" and excluding them from analysis/compilation
inputs so clang doesn't attempt to parse pointer text as C++.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QE4m3KvE2m1OE4m3H128PagedKvDenseP32MultiCtasKvVarSeqQ8Kv128StaticSwapsAbForGen_cubin.cpp`:
- Line 2: CI workflows fail because Git LFS pointer files like the *.cubin.cpp
files are not being materialized before analysis; add an explicit LFS fetch step
(e.g., run "git lfs pull --all" or "git lfs fetch --all && git lfs checkout") at
the start of the jobs that run clang/clang-format/clang-tidy and compilation in
the pr-check.yml, precommit-check.yml and blossom-ci.yml workflows so the actual
binary files are present before any analysis/compile steps; place this step
immediately before the first analysis/compile action to ensure tools operate on
real files rather than LFS pointers.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QE4m3KvE2m1OE4m3H256PagedKvSlidingOrChunkedCausalP32VarSeqQ16Kv128PersistentSwapsAbForGen_cubin.cpp`:
- Around line 1-3: The build/lint pipeline is attempting to parse unresolved Git
LFS pointer files (e.g., the cubin artifact named
FmhaSm100aKernel_QE4m3KvE2m1OE4m3H256PagedKvSlidingOrChunkedCausalP32VarSeqQ16Kv128PersistentSwapsAbForGen_cubin.cpp)
as C++ source; update CI to run git lfs pull (or git lfs fetch && git lfs
checkout) before invoking the C++ parser/compiler, and/or exclude the cubin
artifacts directory/pattern from C++ analysis (add the cubin filename pattern or
directory to the linter/clang-tidy excludes in the build/lint job) so the
pointer files are not fed to the compiler.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128PackedQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp`:
- Around line 1-3: The CMake build is globbing all .cpp files (file(GLOB_RECURSE
SRC_CPP *.cpp)) and later adding them to the target, which accidentally includes
LFS pointer `_cubin.cpp` files; update
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/CMakeLists.txt to explicitly
filter out files matching "*_cubin.cpp" (or similar pattern) from the SRC_CPP
list before calling target_sources/target_link_libraries, e.g., add a remove or
list(FILTER) step after SRC_CPP is populated (and ensure this happens before
filter_source_cuda_architectures or any add_*_sources call) so that
functions/variables referenced like SRC_CPP and filter_source_cuda_architectures
are preserved but `_cubin.cpp` entries are excluded from compilation.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H64PackedQkvDenseVarSeqSkipsSoftmaxQ128Kv128StaticContext_cubin.cpp`:
- Line 2: The CI is compiling an LFS pointer file
(FmhaSm100aKernel_QkInt8VE4m3OBfloat16H64PackedQkvDenseVarSeqSkipsSoftmaxQ128Kv128StaticContext_cubin.cpp)
because Git LFS objects are not hydrated; add an explicit hydration step in the
pipeline before any compile/static-analysis stage (e.g., run git lfs pull ||
(git lfs fetch --all && git lfs checkout)) and fail early if hydration fails;
update the CI job(s) that run CMake/clang/compilation (the build stage that
references CMakeLists.txt) to execute this command and verify
.gitattributes-handled patterns are present so the actual .cubin.cpp sources are
present for the compiler.

---

Nitpick comments:
In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128SeparateQkvDenseVarSeqSkipsSoftmaxQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp`:
- Around line 1-3: The file under
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/*_cubin.cpp is a Git LFS
pointer (starts with "version https://git-lfs.github.com/spec/v1") and should be
excluded from C++ parsing/compile analysis; modify the C++ analysis inputs
(e.g., the build/CI config, clang-tidy/clangd file list, or the source discovery
logic) to detect and skip files that begin with the LFS pointer header and/or
match the glob cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/*_cubin.cpp
when unresolved as LFS pointers so they are not passed to the compiler or static
analyzers.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128SeparateQkvDenseVarSeqSkipsSoftmaxQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp`:
- Around line 1-3: The CI/clang static-analysis is parsing Git LFS pointer files
(e.g. the file named
FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128SeparateQkvDenseVarSeqSkipsSoftmaxQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp)
as C++ source; update the clang-based job or compile-command generator to filter
out LFS pointer/cubin artifacts by skipping files that match the cubin pointer
pattern (e.g. *.cubin.cpp or path/glob containing "trtllmGenKernels/*/cubin") or
by detecting files whose first line starts with "version
https://git-lfs.github.com/spec/v1" and excluding them from analysis/compilation
inputs so clang doesn't attempt to parse pointer text as C++.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

coderabbitai · 2026-04-27T11:26:03Z

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:331b82e248db2885154eb327f46161623026d4db48602b6e804b5112c8398dbd
+oid sha256:9198f47281ac5310819d3796860f7d6c26c03213456fdba9f81ba307421c606e


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "1) Confirm these files are LFS-tracked via attributes:" fd '_cubin\.cpp$' cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin \ | xargs -I{} git check-attr -a "{}" | rg -n 'filter: lfs|diff: lfs|merge: lfs' echo echo "2) Confirm pointer format is present in working tree (indicates smudge may not have run):" fd '_cubin\.cpp$' cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin \ | xargs -I{} sh -c 'head -n 2 "{}" | sed "s|^|{}: |"' echo echo "3) Inspect CI/workflow scripts for explicit git-lfs install/fetch/pull before analysis:" rg -n -C2 'git lfs|lfs pull|lfs fetch|checkout.*lfs|actions/checkout' .github

Repository: NVIDIA/TensorRT-LLM

Length of output: 50377

🏁 Script executed:

fd -e yml -e yaml .github/workflows/ -type f 2>/dev/null \ | head -20 \ | xargs -I{} sh -c 'echo "=== {} ===" && cat "{}"' 2>/dev/null

Repository: NVIDIA/TensorRT-LLM

Length of output: 45

🏁 Script executed:

find . -maxdepth 3 -type d -name ".github" 2>/dev/null

Repository: NVIDIA/TensorRT-LLM

Length of output: 73

🏁 Script executed:

find .github -type f $ -name "*.yml" -o -name "*.yaml" $ 2>/dev/null | head -20

Repository: NVIDIA/TensorRT-LLM

Length of output: 846

🏁 Script executed:

cat .github/workflows/blossom-ci.yml | head -150

Repository: NVIDIA/TensorRT-LLM

Length of output: 3812

🏁 Script executed:

cat .github/workflows/pr-check.yml

Repository: NVIDIA/TensorRT-LLM

Length of output: 2719

🏁 Script executed:

cat .github/workflows/precommit-check.yml

Repository: NVIDIA/TensorRT-LLM

Length of output: 2136

🏁 Script executed:

cat .github/workflows/l0-test.yml | head -100

Repository: NVIDIA/TensorRT-LLM

Length of output: 3171

🏁 Script executed:

rg -l "git lfs|lfs pull|lfs fetch|checkout.*lfs" .github/workflows/ 2>/dev/null || echo "No LFS configuration found in workflows"

Repository: NVIDIA/TensorRT-LLM

Length of output: 103

🏁 Script executed:

cat .gitattributes | grep -A2 -B2 "cubin\|lfs" | head -30

Repository: NVIDIA/TensorRT-LLM

Length of output: 1217

Add explicit LFS fetch step to CI workflows before analysis.

These *cubin.cpp files are Git LFS-tracked (confirmed in .gitattributes), but the file pointer (not the binary) is committed to the repository. Without an explicit git lfs pull or similar fetch step in CI workflows (pr-check.yml, precommit-check.yml, blossom-ci.yml), static analysis tools will attempt to parse the LFS pointer text as C++ code, causing clang, clang-format, and clang-tidy to fail with parse errors. Add LFS materialization before any analysis/compile steps in CI.

🧰 Tools

🪛 Clang (14.0.6)

[error] 2-2: unknown type name 'oid'; did you mean 'void'?

(clang-diagnostic-error)

[error] 2-2: variable has incomplete type 'void'

(clang-diagnostic-error)

[error] 2-2: expected ';' after top level declarator

(clang-diagnostic-error)

[error] 2-2: expected unqualified-id

(clang-diagnostic-error)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QE4m3KvE2m1OE4m3H128PagedKvDenseP32MultiCtasKvVarSeqQ8Kv128StaticSwapsAbForGen_cubin.cpp` at line 2, CI workflows fail because Git LFS pointer files like the *.cubin.cpp files are not being materialized before analysis; add an explicit LFS fetch step (e.g., run "git lfs pull --all" or "git lfs fetch --all && git lfs checkout") at the start of the jobs that run clang/clang-format/clang-tidy and compilation in the pr-check.yml, precommit-check.yml and blossom-ci.yml workflows so the actual binary files are present before any analysis/compile steps; place this step immediately before the first analysis/compile action to ensure tools operate on real files rather than LFS pointers.

coderabbitai · 2026-04-27T11:26:03Z

 version https://git-lfs.github.com/spec/v1
-oid sha256:729458cc7556985c685fc24680021d4d6d386999684bcb0b18a65d47a1695d6e
+oid sha256:eaa85c572e89a2a2f5aaa703533fd490fd556e2babfd5499809210ccfe663fce
 size 579887


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Read-only check: detect unresolved LFS pointers among FMHA cubin .cpp artifacts. fd '_cubin\.cpp$' cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin \ | xargs -I{} sh -c 'head -n 1 "{}" | grep -q "version https://git-lfs.github.com/spec/v1" && echo "UNRESOLVED_LFS: {}"' # Expected: # - No output in environments where LFS blobs are resolved for build/lint. # - Any printed paths indicate unresolved pointers that should not be compiled as C++.

Repository: NVIDIA/TensorRT-LLM

Length of output: 50376

Ensure Git LFS blobs are resolved before C++ compilation in build/lint jobs.

The cubin kernel directory contains ~500+ unresolved Git LFS pointers (files starting with version https://git-lfs.github.com/spec/v1). The toolchain appears to be attempting to parse these pointer stubs as C++ source, causing clang errors. Verify that build and lint jobs execute git lfs pull before invoking the C++ parser/compiler, or exclude these cubin artifacts from C++ analysis entirely.

🧰 Tools

🪛 Clang (14.0.6)

[error] 1-1: unknown type name 'version'

(clang-diagnostic-error)

[error] 1-1: expected ';' after top level declarator

(clang-diagnostic-error)

[error] 2-2: unknown type name 'oid'; did you mean 'void'?

(clang-diagnostic-error)

[error] 2-2: variable has incomplete type 'void'

(clang-diagnostic-error)

[error] 2-2: expected ';' after top level declarator

(clang-diagnostic-error)

[error] 2-2: unknown type name 'eaa85c572e89a2a2f5aaa703533fd490fd556e2babfd5499809210ccfe663fce'

(clang-diagnostic-error)

[error] 3-3: expected ';' after top level declarator

(clang-diagnostic-error)

🪛 Cppcheck (2.20.0)

[error] 3-3: syntax error

(syntaxError)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QE4m3KvE2m1OE4m3H256PagedKvSlidingOrChunkedCausalP32VarSeqQ16Kv128PersistentSwapsAbForGen_cubin.cpp` around lines 1 - 3, The build/lint pipeline is attempting to parse unresolved Git LFS pointer files (e.g., the cubin artifact named FmhaSm100aKernel_QE4m3KvE2m1OE4m3H256PagedKvSlidingOrChunkedCausalP32VarSeqQ16Kv128PersistentSwapsAbForGen_cubin.cpp) as C++ source; update CI to run git lfs pull (or git lfs fetch && git lfs checkout) before invoking the C++ parser/compiler, and/or exclude the cubin artifacts directory/pattern from C++ analysis (add the cubin filename pattern or directory to the linter/clang-tidy excludes in the build/lint job) so the pointer files are not fed to the compiler.

coderabbitai · 2026-04-27T11:26:03Z

 version https://git-lfs.github.com/spec/v1
-oid sha256:04440fc4f0a05b4faa4dd444b156cd8eae913147f9a9aea451755f77d56dddbd
+oid sha256:0d65300266e58bd37ed448107ece7d3df90d1ca643998708cab703b1bd114804
 size 759880


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify whether cubin LFS pointer files are referenced by build/lint configs. set -euo pipefail echo "== Locate build config files ==" fd -i '^(CMakeLists\.txt|.*\.cmake|BUILD|BUILD\.bazel|WORKSPACE|meson\.build)$' echo echo "== Search references to cubin pointer path/patterns ==" rg -n -C2 --iglob '*CMakeLists.txt' --iglob '*.cmake' --iglob 'BUILD*' --iglob 'WORKSPACE' --iglob 'meson.build' \ 'trtllmGenKernels/fmha/cubin|_cubin\.cpp|FmhaSm100[a-zA-Z0-9_]*_cubin\.cpp' echo echo "== Search lint/static-analysis config references (if present) ==" rg -n -C2 --iglob '*clang*' --iglob '*lint*' --iglob '*.yml' --iglob '*.yaml' --iglob '*.json' \ 'trtllmGenKernels/fmha/cubin|_cubin\.cpp|cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin'

Repository: NVIDIA/TensorRT-LLM

Length of output: 6949

🏁 Script executed:

cat -n cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/CMakeLists.txt

Repository: NVIDIA/TensorRT-LLM

Length of output: 4987

🏁 Script executed:

rg -B5 -A10 'filter_source_cuda_architectures' cpp/cmake/ --type cmake

Repository: NVIDIA/TensorRT-LLM

Length of output: 3981

🏁 Script executed:

rg -A50 'function(filter_source_cuda_architectures)' cpp/cmake/modules/cuda_configuration.cmake

Repository: NVIDIA/TensorRT-LLM

Length of output: 45

🏁 Script executed:

sed -n '/^function(filter_source_cuda_architectures)/,/^endfunction()/p' cpp/cmake/modules/cuda_configuration.cmake | head -80

Repository: NVIDIA/TensorRT-LLM

Length of output: 3131

🏁 Script executed:

find cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin -type f -name '*.cpp' 2>/dev/null | head -20

Repository: NVIDIA/TensorRT-LLM

Length of output: 3269

🏁 Script executed:

head -5 'cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QE4m3KvE2m1OE4m3H128PagedKvCausalP32VarSeqQ128Kv128PersistentContext_cubin.cpp'

Repository: NVIDIA/TensorRT-LLM

Length of output: 194

🏁 Script executed:

git check-attr filter cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/*.cpp 2>/dev/null | head -20

Repository: NVIDIA/TensorRT-LLM

Length of output: 3529

🏁 Script executed:

cat -n .gitattributes | grep -A5 -B5 'cubin\|lfs'

Repository: NVIDIA/TensorRT-LLM

Length of output: 1329

🏁 Script executed:

# Check if CMakeLists handles LFS pointer files or if there's a .gitignore that might help cat cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/.gitignore 2>/dev/null || echo "No .gitignore in fmha dir" echo "---" # Check the actual directory structure and how files are handled ls -la cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/ | head -20

Repository: NVIDIA/TensorRT-LLM

Length of output: 1113

🏁 Script executed:

# Check if there are any comments or special handling in CMakeLists.txt around the SRC_CPP glob sed -n '45,65p' cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/CMakeLists.txt

Repository: NVIDIA/TensorRT-LLM

Length of output: 893

🏁 Script executed:

# Check how CUDA compilation handles LFS pointer files - does it skip them or fail? # Also verify if the architecture filter actually excludes these files python3 << 'EOF' import re # Simulate the filter logic for architecture 100 arch = 100 match_pattern = f".*[Ss][Mm]_?{arch}.*(cubin\.cpp|\.cu)$" exclude_pattern = f".*[Ss][Mm]_?{arch}f.*(cubin\.cpp|\.cu)$" # Test against actual filenames from the cubin directory test_files = [ "FmhaSm100aKernel_QE4m3KvE2m1OE4m3H128PagedKvCausalP32VarSeqQ128Kv128PersistentContext_cubin.cpp", "fmhaRunner.cpp", "fmhaReduction.cu", ] print("Architecture 100 filtering:") print(f" Match pattern: {match_pattern}") print(f" Exclude pattern: {exclude_pattern}") print() for fname in test_files: matches = re.match(match_pattern, fname) excludes = re.match(exclude_pattern, fname) if exclude_pattern else False status = "EXCLUDED" if matches and excludes else ("INCLUDED" if matches else "NOT MATCHED") print(f" {fname}: {status}") EOF

Repository: NVIDIA/TensorRT-LLM

Length of output: 371

🏁 Script executed:

# Let me verify the full filter function to see if there's additional logic I missed sed -n '/function(filter_source_cuda_architectures)/,/^endfunction()/p' cpp/cmake/modules/cuda_configuration.cmake | tail -50

Repository: NVIDIA/TensorRT-LLM

Length of output: 1672

🏁 Script executed:

# Check if there's any special handling in CMake for LFS pointer files # Look for how add_library handles them or if there's language detection rg -n 'LANGUAGE|LINKER_LANGUAGE|cubin' cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/CMakeLists.txt

Repository: NVIDIA/TensorRT-LLM

Length of output: 45

🏁 Script executed:

# Check if CMake has any special handling for LFS files or language detection # Also check if there's any build documentation about cubin files find cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha -name '*.md' -o -name 'README*' 2>/dev/null echo "---" # Check if there are any comments in the CMakeLists about cubin handling grep -n -i 'cubin\|lfs\|binary' cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/CMakeLists.txt

Repository: NVIDIA/TensorRT-LLM

Length of output: 655

🏁 Script executed:

# Check if there's any custom CMake rule or property that marks cubin files specially # Also look for how the kernels are actually loaded at runtime rg -n 'cubin\|kernel.*load\|dlopen\|dlsym' cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaRunner.cpp | head -30

Repository: NVIDIA/TensorRT-LLM

Length of output: 45

🏁 Script executed:

# Check if there's a .gitattributes diff/merge/clean filter that might prevent LFS pointers from being compiled cat .gitattributes | head -20 echo "---" # Check if there's any documentation about how the build handles LFS files rg -n 'git.*lfs\|lfs.*filter' . --type md --type txt 2>/dev/null | head -20

Repository: NVIDIA/TensorRT-LLM

Length of output: 1221

Ensure _cubin.cpp pointer files are explicitly excluded from C++ compilation targets.

The CMakeLists.txt at line 49 uses file(GLOB_RECURSE SRC_CPP *.cpp) to collect all .cpp files, which includes LFS pointer files. Line 60 adds all collected files to the compilation target. When LFS is not pulled (e.g., shallow clones or CI without git-lfs), these pointer files contain plaintext like version https://git-lfs.github.com/spec/v1... and will cause hard C++ parser failures during compilation. While .pre-commit-config.yaml correctly excludes cubin.cpp from linting, the build system lacks explicit exclusion from compilation. The filter_source_cuda_architectures function filters only by SM architecture patterns, not by file type or LFS status. Add an explicit filter in cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/CMakeLists.txt to remove *_cubin.cpp files before adding sources to the compilation target.

🧰 Tools

🪛 Clang (14.0.6)

[error] 1-1: unknown type name 'version'

(clang-diagnostic-error)

[error] 1-1: expected ';' after top level declarator

(clang-diagnostic-error)

[error] 2-2: unknown type name 'oid'; did you mean 'void'?

(clang-diagnostic-error)

[error] 2-2: variable has incomplete type 'void'

(clang-diagnostic-error)

[error] 2-2: expected ';' after top level declarator

(clang-diagnostic-error)

[error] 2-2: expected unqualified-id

(clang-diagnostic-error)

🪛 Cppcheck (2.20.0)

[error] 3-3: syntax error

(syntaxError)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128PackedQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp` around lines 1 - 3, The CMake build is globbing all .cpp files (file(GLOB_RECURSE SRC_CPP *.cpp)) and later adding them to the target, which accidentally includes LFS pointer `_cubin.cpp` files; update cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/CMakeLists.txt to explicitly filter out files matching "*_cubin.cpp" (or similar pattern) from the SRC_CPP list before calling target_sources/target_link_libraries, e.g., add a remove or list(FILTER) step after SRC_CPP is populated (and ensure this happens before filter_source_cuda_architectures or any add_*_sources call) so that functions/variables referenced like SRC_CPP and filter_source_cuda_architectures are preserved but `_cubin.cpp` entries are excluded from compilation.

coderabbitai · 2026-04-27T11:26:03Z

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:024b568fbadb7ed013f975f8b95fd3b81c58aafb9194005bb6c64badfa889108
+oid sha256:c45588917e7f4d80700d85b67be5e8030520ac450dec3d44a6d979c6eea4abe5


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "1) Check LFS tracking for these cubin .cpp files" rg -n --hidden --glob '.gitattributes' 'trtllmGenKernels/fmha/cubin/.*\.cpp|filter=lfs' -C2 echo echo "2) Check CI/scripts for explicit LFS hydration" rg -n --hidden -S 'git lfs (install|fetch|pull|checkout)' -C2 echo echo "3) Spot pointer placeholders currently present in repo checkout" rg -n --hidden -S '^version https://git-lfs.github.com/spec/v1$|^oid sha256:' cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin -C1

Repository: NVIDIA/TensorRT-LLM

Length of output: 50379

Ensure Git LFS objects are hydrated before C++ compilation in CI.

Line 2 of this file contains a Git LFS pointer record, not actual source code. The .gitattributes file confirms *cubin.cpp filter=lfs diff=lfs merge=lfs -text, and verification shows these files currently exist as pointers in the repository. If clang or any C++ compiler attempts to parse this LFS pointer (version https://git-lfs.github.com/spec/v1 followed by oid sha256:...) as source, it will fail with parsing errors.

CI must explicitly run git lfs pull (or equivalent hydration) before any compilation or static analysis stage that touches these files. While documentation and CMakeLists.txt reference LFS handling, there should be an enforced, mandatory hydration step in the build pipeline—not just an error message after-the-fact.

🧰 Tools

🪛 Clang (14.0.6)

[error] 2-2: unknown type name 'oid'; did you mean 'void'?

(clang-diagnostic-error)

[error] 2-2: variable has incomplete type 'void'

(clang-diagnostic-error)

[error] 2-2: expected ';' after top level declarator

(clang-diagnostic-error)

[error] 2-2: unknown type name 'c45588917e7f4d80700d85b67be5e8030520ac450dec3d44a6d979c6eea4abe5'

(clang-diagnostic-error)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H64PackedQkvDenseVarSeqSkipsSoftmaxQ128Kv128StaticContext_cubin.cpp` at line 2, The CI is compiling an LFS pointer file (FmhaSm100aKernel_QkInt8VE4m3OBfloat16H64PackedQkvDenseVarSeqSkipsSoftmaxQ128Kv128StaticContext_cubin.cpp) because Git LFS objects are not hydrated; add an explicit hydration step in the pipeline before any compile/static-analysis stage (e.g., run git lfs pull || (git lfs fetch --all && git lfs checkout)) and fail early if hydration fails; update the CI job(s) that run CMake/clang/compilation (the build stage that references CMakeLists.txt) to execute this command and verify .gitattributes-handled patterns are present so the actual .cubin.cpp sources are present for the compiler.

Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>

yunruis · 2026-04-27T16:43:15Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-27T16:50:24Z

PR_Github #45754 [ run ] triggered by Bot. Commit: c977b5d Link to invocation

yunruis added 3 commits April 26, 2026 19:24

path sel more to nvrtc

ad2c15f

Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>

eager mode generation change to max_kv_len, to ensure all fmha gen ke…

815e921

…rnel cached, avoid JIT on runtime Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>

fix swa generation perf issue

852da81

Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>

github-actions Bot assigned yunruis Apr 27, 2026

yunruis changed the title ~~User/yunruis/drop cubin fmha narrow pathsel cache miss~~ [None][perf] Drop cubin and Eliminate ~6s FMHA JIT recompile in eager generation by aligning kernel selection with CUDA graph warmup Apr 27, 2026

coderabbitai Bot reviewed Apr 27, 2026

View reviewed changes

drop cubin file with nvrtc path

c977b5d

Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>

yunruis force-pushed the user/yunruis/drop_cubin_fmha_narrow_pathsel_cache_miss branch from 2567c94 to c977b5d Compare April 27, 2026 16:42

Conversation

yunruis commented Apr 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

yunruis commented Apr 27, 2026

Uh oh!

yunruis commented Apr 27, 2026

Uh oh!

yunruis commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

coderabbitai Bot commented Apr 27, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

yunruis commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yunruis commented Apr 27, 2026 •

edited by coderabbitai Bot

Loading