Skip to content

[None][perf] Drop cubin and Eliminate ~6s FMHA JIT recompile in eager generation by aligning kernel selection with CUDA graph warmup#13505

Open
yunruis wants to merge 4 commits intoNVIDIA:mainfrom
yunruis:user/yunruis/drop_cubin_fmha_narrow_pathsel_cache_miss
Open

[None][perf] Drop cubin and Eliminate ~6s FMHA JIT recompile in eager generation by aligning kernel selection with CUDA graph warmup#13505
yunruis wants to merge 4 commits intoNVIDIA:mainfrom
yunruis:user/yunruis/drop_cubin_fmha_narrow_pathsel_cache_miss

Conversation

@yunruis
Copy link
Copy Markdown
Contributor

@yunruis yunruis commented Apr 27, 2026

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Fixed attention window parameter handling during multi-turn generation in MLA scenarios.
  • Performance

    • Updated kernel implementations for optimized attention computations on SM100/SM100f architectures.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

yunruis added 3 commits April 26, 2026 19:24
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
…rnel cached, avoid JIT on runtime

Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
@yunruis yunruis changed the title User/yunruis/drop cubin fmha narrow pathsel cache miss [None][perf] Drop cubin and Eliminate ~6s FMHA JIT recompile in eager generation by aligning kernel selection with CUDA graph warmup Apr 27, 2026
@yunruis
Copy link
Copy Markdown
Contributor Author

yunruis commented Apr 27, 2026

/bot run --disable-fail-fast

@yunruis
Copy link
Copy Markdown
Contributor Author

yunruis commented Apr 27, 2026

/bot kill

@yunruis
Copy link
Copy Markdown
Contributor Author

yunruis commented Apr 27, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45724 [ run ] triggered by Bot. Commit: 2567c94 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45725 [ run ] triggered by Bot. Commit: 2567c94 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45724 [ run ] completed with state ABORTED. Commit: 2567c94

Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45726 [ kill ] triggered by Bot. Commit: 2567c94 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45725 [ run ] completed with state ABORTED. Commit: 2567c94

Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45726 [ kill ] completed with state SUCCESS. Commit: 2567c94
Successfully killed previous jobs for commit 2567c94

Link to invocation

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 27, 2026

📝 Walkthrough

Walkthrough

The PR modifies MLA generation parameter handling in the attention operation dispatcher, changing the KV-cache maximum sequence length source from max_past_kv_length to max_attention_window_size with a TODO for future SWA alignment. Additionally, all FMHA kernel binaries (cubin files) are updated to newer compiled versions.

Changes

Cohort / File(s) Summary
MLA Generation Parameter Handling
cpp/tensorrt_llm/common/attentionOp.cpp
Changes tllmRunnerParams.mMaxSeqLenKv assignment during MLA generation from generation_params.max_past_kv_length to generation_params.max_attention_window_size with inline rationale comment and TODO for future SWA logic mirroring.
FMHA Kernel Cubin LFS Pointer Updates (SM100a/SM100f)
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100*_cubin.cpp (770+ files)
Updates Git LFS OID (SHA-256 hash) references for precompiled FMHA kernel binaries across multiple configurations (varying head dimensions, sequence lengths, kernel types). File sizes remain unchanged; only binary artifact references are updated.
FMHA Kernel Cubin LFS Pointer Deletions
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100fKernel_QkvBfloat16OBfloat16H128PagedKv*..._cubin.cpp (18 files)
Removes Git LFS pointer metadata for specific QKV kernel variants (with MultiCtasKvCga/MultiCtasKvVarSeq/SkipsSoftmax/SwapsAbForGen suffixes), eliminating stored references to those precompiled binaries.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested labels

VisualGen

Suggested reviewers

  • yuxianq
  • niukuo
  • PerkzZheng
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete; it contains only a template with no actual explanatory content filled in. Required sections like Description, Test Coverage, and proper PR checklist details are missing. Fill in the Description section explaining the issue and solution. Add Test Coverage section listing relevant tests. Ensure all PR Checklist items are properly reviewed and documented.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly summarizes the main change: dropping cubins and eliminating FMHA JIT recompile overhead during eager generation by aligning kernel selection with CUDA graph warmup.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128SeparateQkvDenseVarSeqSkipsSoftmaxQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp (1)

1-3: Exclude Git LFS pointer artifacts from C++ static-analysis/compile targets.

This file is a valid Git LFS pointer, so clang parsing errors here are tooling noise and can hide real diagnostics in actual source files. Please filter cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/*_cubin.cpp (when unresolved as LFS pointers) from C++ analysis inputs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128SeparateQkvDenseVarSeqSkipsSoftmaxQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp`
around lines 1 - 3, The file under
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/*_cubin.cpp is a Git LFS
pointer (starts with "version https://git-lfs.github.com/spec/v1") and should be
excluded from C++ parsing/compile analysis; modify the C++ analysis inputs
(e.g., the build/CI config, clang-tidy/clangd file list, or the source discovery
logic) to detect and skip files that begin with the LFS pointer header and/or
match the glob cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/*_cubin.cpp
when unresolved as LFS pointers so they are not passed to the compiler or static
analyzers.
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128SeparateQkvDenseVarSeqSkipsSoftmaxQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp (1)

1-3: Consider excluding LFS cubin pointer files from C++ compilation/static-analysis inputs.

The reported clang errors here are consistent with parsing Git LFS pointer text as C++; filtering this path pattern in clang-based jobs would reduce noise.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128SeparateQkvDenseVarSeqSkipsSoftmaxQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp`
around lines 1 - 3, The CI/clang static-analysis is parsing Git LFS pointer
files (e.g. the file named
FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128SeparateQkvDenseVarSeqSkipsSoftmaxQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp)
as C++ source; update the clang-based job or compile-command generator to filter
out LFS pointer/cubin artifacts by skipping files that match the cubin pointer
pattern (e.g. *.cubin.cpp or path/glob containing "trtllmGenKernels/*/cubin") or
by detecting files whose first line starts with "version
https://git-lfs.github.com/spec/v1" and excluding them from analysis/compilation
inputs so clang doesn't attempt to parse pointer text as C++.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QE4m3KvE2m1OE4m3H128PagedKvDenseP32MultiCtasKvVarSeqQ8Kv128StaticSwapsAbForGen_cubin.cpp`:
- Line 2: CI workflows fail because Git LFS pointer files like the *.cubin.cpp
files are not being materialized before analysis; add an explicit LFS fetch step
(e.g., run "git lfs pull --all" or "git lfs fetch --all && git lfs checkout") at
the start of the jobs that run clang/clang-format/clang-tidy and compilation in
the pr-check.yml, precommit-check.yml and blossom-ci.yml workflows so the actual
binary files are present before any analysis/compile steps; place this step
immediately before the first analysis/compile action to ensure tools operate on
real files rather than LFS pointers.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QE4m3KvE2m1OE4m3H256PagedKvSlidingOrChunkedCausalP32VarSeqQ16Kv128PersistentSwapsAbForGen_cubin.cpp`:
- Around line 1-3: The build/lint pipeline is attempting to parse unresolved Git
LFS pointer files (e.g., the cubin artifact named
FmhaSm100aKernel_QE4m3KvE2m1OE4m3H256PagedKvSlidingOrChunkedCausalP32VarSeqQ16Kv128PersistentSwapsAbForGen_cubin.cpp)
as C++ source; update CI to run git lfs pull (or git lfs fetch && git lfs
checkout) before invoking the C++ parser/compiler, and/or exclude the cubin
artifacts directory/pattern from C++ analysis (add the cubin filename pattern or
directory to the linter/clang-tidy excludes in the build/lint job) so the
pointer files are not fed to the compiler.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128PackedQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp`:
- Around line 1-3: The CMake build is globbing all .cpp files (file(GLOB_RECURSE
SRC_CPP *.cpp)) and later adding them to the target, which accidentally includes
LFS pointer `_cubin.cpp` files; update
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/CMakeLists.txt to explicitly
filter out files matching "*_cubin.cpp" (or similar pattern) from the SRC_CPP
list before calling target_sources/target_link_libraries, e.g., add a remove or
list(FILTER) step after SRC_CPP is populated (and ensure this happens before
filter_source_cuda_architectures or any add_*_sources call) so that
functions/variables referenced like SRC_CPP and filter_source_cuda_architectures
are preserved but `_cubin.cpp` entries are excluded from compilation.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H64PackedQkvDenseVarSeqSkipsSoftmaxQ128Kv128StaticContext_cubin.cpp`:
- Line 2: The CI is compiling an LFS pointer file
(FmhaSm100aKernel_QkInt8VE4m3OBfloat16H64PackedQkvDenseVarSeqSkipsSoftmaxQ128Kv128StaticContext_cubin.cpp)
because Git LFS objects are not hydrated; add an explicit hydration step in the
pipeline before any compile/static-analysis stage (e.g., run git lfs pull ||
(git lfs fetch --all && git lfs checkout)) and fail early if hydration fails;
update the CI job(s) that run CMake/clang/compilation (the build stage that
references CMakeLists.txt) to execute this command and verify
.gitattributes-handled patterns are present so the actual .cubin.cpp sources are
present for the compiler.

---

Nitpick comments:
In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128SeparateQkvDenseVarSeqSkipsSoftmaxQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp`:
- Around line 1-3: The file under
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/*_cubin.cpp is a Git LFS
pointer (starts with "version https://git-lfs.github.com/spec/v1") and should be
excluded from C++ parsing/compile analysis; modify the C++ analysis inputs
(e.g., the build/CI config, clang-tidy/clangd file list, or the source discovery
logic) to detect and skip files that begin with the LFS pointer header and/or
match the glob cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/*_cubin.cpp
when unresolved as LFS pointers so they are not passed to the compiler or static
analyzers.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128SeparateQkvDenseVarSeqSkipsSoftmaxQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp`:
- Around line 1-3: The CI/clang static-analysis is parsing Git LFS pointer files
(e.g. the file named
FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128SeparateQkvDenseVarSeqSkipsSoftmaxQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp)
as C++ source; update the clang-based job or compile-command generator to filter
out LFS pointer/cubin artifacts by skipping files that match the cubin pointer
pattern (e.g. *.cubin.cpp or path/glob containing "trtllmGenKernels/*/cubin") or
by detecting files whose first line starts with "version
https://git-lfs.github.com/spec/v1" and excluding them from analysis/compilation
inputs so clang doesn't attempt to parse pointer text as C++.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:331b82e248db2885154eb327f46161623026d4db48602b6e804b5112c8398dbd
oid sha256:9198f47281ac5310819d3796860f7d6c26c03213456fdba9f81ba307421c606e
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "1) Confirm these files are LFS-tracked via attributes:"
fd '_cubin\.cpp$' cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin \
  | xargs -I{} git check-attr -a "{}" | rg -n 'filter: lfs|diff: lfs|merge: lfs'

echo
echo "2) Confirm pointer format is present in working tree (indicates smudge may not have run):"
fd '_cubin\.cpp$' cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin \
  | xargs -I{} sh -c 'head -n 2 "{}" | sed "s|^|{}: |"'

echo
echo "3) Inspect CI/workflow scripts for explicit git-lfs install/fetch/pull before analysis:"
rg -n -C2 'git lfs|lfs pull|lfs fetch|checkout.*lfs|actions/checkout' .github

Repository: NVIDIA/TensorRT-LLM

Length of output: 50377


🏁 Script executed:

fd -e yml -e yaml .github/workflows/ -type f 2>/dev/null \
  | head -20 \
  | xargs -I{} sh -c 'echo "=== {} ===" && cat "{}"' 2>/dev/null

Repository: NVIDIA/TensorRT-LLM

Length of output: 45


🏁 Script executed:

find . -maxdepth 3 -type d -name ".github" 2>/dev/null

Repository: NVIDIA/TensorRT-LLM

Length of output: 73


🏁 Script executed:

find .github -type f \( -name "*.yml" -o -name "*.yaml" \) 2>/dev/null | head -20

Repository: NVIDIA/TensorRT-LLM

Length of output: 846


🏁 Script executed:

cat .github/workflows/blossom-ci.yml | head -150

Repository: NVIDIA/TensorRT-LLM

Length of output: 3812


🏁 Script executed:

cat .github/workflows/pr-check.yml

Repository: NVIDIA/TensorRT-LLM

Length of output: 2719


🏁 Script executed:

cat .github/workflows/precommit-check.yml

Repository: NVIDIA/TensorRT-LLM

Length of output: 2136


🏁 Script executed:

cat .github/workflows/l0-test.yml | head -100

Repository: NVIDIA/TensorRT-LLM

Length of output: 3171


🏁 Script executed:

rg -l "git lfs|lfs pull|lfs fetch|checkout.*lfs" .github/workflows/ 2>/dev/null || echo "No LFS configuration found in workflows"

Repository: NVIDIA/TensorRT-LLM

Length of output: 103


🏁 Script executed:

cat .gitattributes | grep -A2 -B2 "cubin\|lfs" | head -30

Repository: NVIDIA/TensorRT-LLM

Length of output: 1217


Add explicit LFS fetch step to CI workflows before analysis.

These *cubin.cpp files are Git LFS-tracked (confirmed in .gitattributes), but the file pointer (not the binary) is committed to the repository. Without an explicit git lfs pull or similar fetch step in CI workflows (pr-check.yml, precommit-check.yml, blossom-ci.yml), static analysis tools will attempt to parse the LFS pointer text as C++ code, causing clang, clang-format, and clang-tidy to fail with parse errors. Add LFS materialization before any analysis/compile steps in CI.

🧰 Tools
🪛 Clang (14.0.6)

[error] 2-2: unknown type name 'oid'; did you mean 'void'?

(clang-diagnostic-error)


[error] 2-2: variable has incomplete type 'void'

(clang-diagnostic-error)


[error] 2-2: expected ';' after top level declarator

(clang-diagnostic-error)


[error] 2-2: expected unqualified-id

(clang-diagnostic-error)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QE4m3KvE2m1OE4m3H128PagedKvDenseP32MultiCtasKvVarSeqQ8Kv128StaticSwapsAbForGen_cubin.cpp`
at line 2, CI workflows fail because Git LFS pointer files like the *.cubin.cpp
files are not being materialized before analysis; add an explicit LFS fetch step
(e.g., run "git lfs pull --all" or "git lfs fetch --all && git lfs checkout") at
the start of the jobs that run clang/clang-format/clang-tidy and compilation in
the pr-check.yml, precommit-check.yml and blossom-ci.yml workflows so the actual
binary files are present before any analysis/compile steps; place this step
immediately before the first analysis/compile action to ensure tools operate on
real files rather than LFS pointers.

Comment on lines 1 to 3
version https://git-lfs.github.com/spec/v1
oid sha256:729458cc7556985c685fc24680021d4d6d386999684bcb0b18a65d47a1695d6e
oid sha256:eaa85c572e89a2a2f5aaa703533fd490fd556e2babfd5499809210ccfe663fce
size 579887
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read-only check: detect unresolved LFS pointers among FMHA cubin .cpp artifacts.
fd '_cubin\.cpp$' cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin \
  | xargs -I{} sh -c 'head -n 1 "{}" | grep -q "version https://git-lfs.github.com/spec/v1" && echo "UNRESOLVED_LFS: {}"'

# Expected:
# - No output in environments where LFS blobs are resolved for build/lint.
# - Any printed paths indicate unresolved pointers that should not be compiled as C++.

Repository: NVIDIA/TensorRT-LLM

Length of output: 50376


Ensure Git LFS blobs are resolved before C++ compilation in build/lint jobs.

The cubin kernel directory contains ~500+ unresolved Git LFS pointers (files starting with version https://git-lfs.github.com/spec/v1). The toolchain appears to be attempting to parse these pointer stubs as C++ source, causing clang errors. Verify that build and lint jobs execute git lfs pull before invoking the C++ parser/compiler, or exclude these cubin artifacts from C++ analysis entirely.

🧰 Tools
🪛 Clang (14.0.6)

[error] 1-1: unknown type name 'version'

(clang-diagnostic-error)


[error] 1-1: expected ';' after top level declarator

(clang-diagnostic-error)


[error] 2-2: unknown type name 'oid'; did you mean 'void'?

(clang-diagnostic-error)


[error] 2-2: variable has incomplete type 'void'

(clang-diagnostic-error)


[error] 2-2: expected ';' after top level declarator

(clang-diagnostic-error)


[error] 2-2: unknown type name 'eaa85c572e89a2a2f5aaa703533fd490fd556e2babfd5499809210ccfe663fce'

(clang-diagnostic-error)


[error] 3-3: expected ';' after top level declarator

(clang-diagnostic-error)

🪛 Cppcheck (2.20.0)

[error] 3-3: syntax error

(syntaxError)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QE4m3KvE2m1OE4m3H256PagedKvSlidingOrChunkedCausalP32VarSeqQ16Kv128PersistentSwapsAbForGen_cubin.cpp`
around lines 1 - 3, The build/lint pipeline is attempting to parse unresolved
Git LFS pointer files (e.g., the cubin artifact named
FmhaSm100aKernel_QE4m3KvE2m1OE4m3H256PagedKvSlidingOrChunkedCausalP32VarSeqQ16Kv128PersistentSwapsAbForGen_cubin.cpp)
as C++ source; update CI to run git lfs pull (or git lfs fetch && git lfs
checkout) before invoking the C++ parser/compiler, and/or exclude the cubin
artifacts directory/pattern from C++ analysis (add the cubin filename pattern or
directory to the linter/clang-tidy excludes in the build/lint job) so the
pointer files are not fed to the compiler.

Comment on lines 1 to 3
version https://git-lfs.github.com/spec/v1
oid sha256:04440fc4f0a05b4faa4dd444b156cd8eae913147f9a9aea451755f77d56dddbd
oid sha256:0d65300266e58bd37ed448107ece7d3df90d1ca643998708cab703b1bd114804
size 759880
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify whether cubin LFS pointer files are referenced by build/lint configs.

set -euo pipefail

echo "== Locate build config files =="
fd -i '^(CMakeLists\.txt|.*\.cmake|BUILD|BUILD\.bazel|WORKSPACE|meson\.build)$'

echo
echo "== Search references to cubin pointer path/patterns =="
rg -n -C2 --iglob '*CMakeLists.txt' --iglob '*.cmake' --iglob 'BUILD*' --iglob 'WORKSPACE' --iglob 'meson.build' \
  'trtllmGenKernels/fmha/cubin|_cubin\.cpp|FmhaSm100[a-zA-Z0-9_]*_cubin\.cpp'

echo
echo "== Search lint/static-analysis config references (if present) =="
rg -n -C2 --iglob '*clang*' --iglob '*lint*' --iglob '*.yml' --iglob '*.yaml' --iglob '*.json' \
  'trtllmGenKernels/fmha/cubin|_cubin\.cpp|cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin'

Repository: NVIDIA/TensorRT-LLM

Length of output: 6949


🏁 Script executed:

cat -n cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/CMakeLists.txt

Repository: NVIDIA/TensorRT-LLM

Length of output: 4987


🏁 Script executed:

rg -B5 -A10 'filter_source_cuda_architectures' cpp/cmake/ --type cmake

Repository: NVIDIA/TensorRT-LLM

Length of output: 3981


🏁 Script executed:

rg -A50 'function(filter_source_cuda_architectures)' cpp/cmake/modules/cuda_configuration.cmake

Repository: NVIDIA/TensorRT-LLM

Length of output: 45


🏁 Script executed:

sed -n '/^function(filter_source_cuda_architectures)/,/^endfunction()/p' cpp/cmake/modules/cuda_configuration.cmake | head -80

Repository: NVIDIA/TensorRT-LLM

Length of output: 3131


🏁 Script executed:

find cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin -type f -name '*.cpp' 2>/dev/null | head -20

Repository: NVIDIA/TensorRT-LLM

Length of output: 3269


🏁 Script executed:

head -5 'cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QE4m3KvE2m1OE4m3H128PagedKvCausalP32VarSeqQ128Kv128PersistentContext_cubin.cpp'

Repository: NVIDIA/TensorRT-LLM

Length of output: 194


🏁 Script executed:

git check-attr filter cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/*.cpp 2>/dev/null | head -20

Repository: NVIDIA/TensorRT-LLM

Length of output: 3529


🏁 Script executed:

cat -n .gitattributes | grep -A5 -B5 'cubin\|lfs'

Repository: NVIDIA/TensorRT-LLM

Length of output: 1329


🏁 Script executed:

# Check if CMakeLists handles LFS pointer files or if there's a .gitignore that might help
cat cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/.gitignore 2>/dev/null || echo "No .gitignore in fmha dir"
echo "---"
# Check the actual directory structure and how files are handled
ls -la cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/ | head -20

Repository: NVIDIA/TensorRT-LLM

Length of output: 1113


🏁 Script executed:

# Check if there are any comments or special handling in CMakeLists.txt around the SRC_CPP glob
sed -n '45,65p' cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/CMakeLists.txt

Repository: NVIDIA/TensorRT-LLM

Length of output: 893


🏁 Script executed:

# Check how CUDA compilation handles LFS pointer files - does it skip them or fail?
# Also verify if the architecture filter actually excludes these files
python3 << 'EOF'
import re

# Simulate the filter logic for architecture 100
arch = 100
match_pattern = f".*[Ss][Mm]_?{arch}.*(cubin\.cpp|\.cu)$"
exclude_pattern = f".*[Ss][Mm]_?{arch}f.*(cubin\.cpp|\.cu)$"

# Test against actual filenames from the cubin directory
test_files = [
    "FmhaSm100aKernel_QE4m3KvE2m1OE4m3H128PagedKvCausalP32VarSeqQ128Kv128PersistentContext_cubin.cpp",
    "fmhaRunner.cpp",
    "fmhaReduction.cu",
]

print("Architecture 100 filtering:")
print(f"  Match pattern: {match_pattern}")
print(f"  Exclude pattern: {exclude_pattern}")
print()

for fname in test_files:
    matches = re.match(match_pattern, fname)
    excludes = re.match(exclude_pattern, fname) if exclude_pattern else False
    status = "EXCLUDED" if matches and excludes else ("INCLUDED" if matches else "NOT MATCHED")
    print(f"  {fname}: {status}")
EOF

Repository: NVIDIA/TensorRT-LLM

Length of output: 371


🏁 Script executed:

# Let me verify the full filter function to see if there's additional logic I missed
sed -n '/function(filter_source_cuda_architectures)/,/^endfunction()/p' cpp/cmake/modules/cuda_configuration.cmake | tail -50

Repository: NVIDIA/TensorRT-LLM

Length of output: 1672


🏁 Script executed:

# Check if there's any special handling in CMake for LFS pointer files
# Look for how add_library handles them or if there's language detection
rg -n 'LANGUAGE|LINKER_LANGUAGE|cubin' cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/CMakeLists.txt

Repository: NVIDIA/TensorRT-LLM

Length of output: 45


🏁 Script executed:

# Check if CMake has any special handling for LFS files or language detection
# Also check if there's any build documentation about cubin files
find cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha -name '*.md' -o -name 'README*' 2>/dev/null
echo "---"
# Check if there are any comments in the CMakeLists about cubin handling
grep -n -i 'cubin\|lfs\|binary' cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/CMakeLists.txt

Repository: NVIDIA/TensorRT-LLM

Length of output: 655


🏁 Script executed:

# Check if there's any custom CMake rule or property that marks cubin files specially
# Also look for how the kernels are actually loaded at runtime
rg -n 'cubin\|kernel.*load\|dlopen\|dlsym' cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaRunner.cpp | head -30

Repository: NVIDIA/TensorRT-LLM

Length of output: 45


🏁 Script executed:

# Check if there's a .gitattributes diff/merge/clean filter that might prevent LFS pointers from being compiled
cat .gitattributes | head -20
echo "---"
# Check if there's any documentation about how the build handles LFS files
rg -n 'git.*lfs\|lfs.*filter' . --type md --type txt 2>/dev/null | head -20

Repository: NVIDIA/TensorRT-LLM

Length of output: 1221


Ensure _cubin.cpp pointer files are explicitly excluded from C++ compilation targets.

The CMakeLists.txt at line 49 uses file(GLOB_RECURSE SRC_CPP *.cpp) to collect all .cpp files, which includes LFS pointer files. Line 60 adds all collected files to the compilation target. When LFS is not pulled (e.g., shallow clones or CI without git-lfs), these pointer files contain plaintext like version https://git-lfs.github.com/spec/v1... and will cause hard C++ parser failures during compilation. While .pre-commit-config.yaml correctly excludes cubin.cpp from linting, the build system lacks explicit exclusion from compilation. The filter_source_cuda_architectures function filters only by SM architecture patterns, not by file type or LFS status. Add an explicit filter in cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/CMakeLists.txt to remove *_cubin.cpp files before adding sources to the compilation target.

🧰 Tools
🪛 Clang (14.0.6)

[error] 1-1: unknown type name 'version'

(clang-diagnostic-error)


[error] 1-1: expected ';' after top level declarator

(clang-diagnostic-error)


[error] 2-2: unknown type name 'oid'; did you mean 'void'?

(clang-diagnostic-error)


[error] 2-2: variable has incomplete type 'void'

(clang-diagnostic-error)


[error] 2-2: expected ';' after top level declarator

(clang-diagnostic-error)


[error] 2-2: expected unqualified-id

(clang-diagnostic-error)

🪛 Cppcheck (2.20.0)

[error] 3-3: syntax error

(syntaxError)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128PackedQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp`
around lines 1 - 3, The CMake build is globbing all .cpp files
(file(GLOB_RECURSE SRC_CPP *.cpp)) and later adding them to the target, which
accidentally includes LFS pointer `_cubin.cpp` files; update
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/CMakeLists.txt to explicitly
filter out files matching "*_cubin.cpp" (or similar pattern) from the SRC_CPP
list before calling target_sources/target_link_libraries, e.g., add a remove or
list(FILTER) step after SRC_CPP is populated (and ensure this happens before
filter_source_cuda_architectures or any add_*_sources call) so that
functions/variables referenced like SRC_CPP and filter_source_cuda_architectures
are preserved but `_cubin.cpp` entries are excluded from compilation.

@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:024b568fbadb7ed013f975f8b95fd3b81c58aafb9194005bb6c64badfa889108
oid sha256:c45588917e7f4d80700d85b67be5e8030520ac450dec3d44a6d979c6eea4abe5
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "1) Check LFS tracking for these cubin .cpp files"
rg -n --hidden --glob '.gitattributes' 'trtllmGenKernels/fmha/cubin/.*\.cpp|filter=lfs' -C2

echo
echo "2) Check CI/scripts for explicit LFS hydration"
rg -n --hidden -S 'git lfs (install|fetch|pull|checkout)' -C2

echo
echo "3) Spot pointer placeholders currently present in repo checkout"
rg -n --hidden -S '^version https://git-lfs.github.com/spec/v1$|^oid sha256:' cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin -C1

Repository: NVIDIA/TensorRT-LLM

Length of output: 50379


Ensure Git LFS objects are hydrated before C++ compilation in CI.

Line 2 of this file contains a Git LFS pointer record, not actual source code. The .gitattributes file confirms *cubin.cpp filter=lfs diff=lfs merge=lfs -text, and verification shows these files currently exist as pointers in the repository. If clang or any C++ compiler attempts to parse this LFS pointer (version https://git-lfs.github.com/spec/v1 followed by oid sha256:...) as source, it will fail with parsing errors.

CI must explicitly run git lfs pull (or equivalent hydration) before any compilation or static analysis stage that touches these files. While documentation and CMakeLists.txt reference LFS handling, there should be an enforced, mandatory hydration step in the build pipeline—not just an error message after-the-fact.

🧰 Tools
🪛 Clang (14.0.6)

[error] 2-2: unknown type name 'oid'; did you mean 'void'?

(clang-diagnostic-error)


[error] 2-2: variable has incomplete type 'void'

(clang-diagnostic-error)


[error] 2-2: expected ';' after top level declarator

(clang-diagnostic-error)


[error] 2-2: unknown type name 'c45588917e7f4d80700d85b67be5e8030520ac450dec3d44a6d979c6eea4abe5'

(clang-diagnostic-error)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H64PackedQkvDenseVarSeqSkipsSoftmaxQ128Kv128StaticContext_cubin.cpp`
at line 2, The CI is compiling an LFS pointer file
(FmhaSm100aKernel_QkInt8VE4m3OBfloat16H64PackedQkvDenseVarSeqSkipsSoftmaxQ128Kv128StaticContext_cubin.cpp)
because Git LFS objects are not hydrated; add an explicit hydration step in the
pipeline before any compile/static-analysis stage (e.g., run git lfs pull ||
(git lfs fetch --all && git lfs checkout)) and fail early if hydration fails;
update the CI job(s) that run CMake/clang/compilation (the build stage that
references CMakeLists.txt) to execute this command and verify
.gitattributes-handled patterns are present so the actual .cubin.cpp sources are
present for the compiler.

Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
@yunruis yunruis force-pushed the user/yunruis/drop_cubin_fmha_narrow_pathsel_cache_miss branch from 2567c94 to c977b5d Compare April 27, 2026 16:42
@yunruis
Copy link
Copy Markdown
Contributor Author

yunruis commented Apr 27, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45754 [ run ] triggered by Bot. Commit: c977b5d Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants