[None][perf] Drop cubin and Eliminate ~6s FMHA JIT recompile in eager generation by aligning kernel selection with CUDA graph warmup#13505
Conversation
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
…rnel cached, avoid JIT on runtime Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
|
/bot run --disable-fail-fast |
|
/bot kill |
|
/bot run --disable-fail-fast |
|
PR_Github #45724 [ run ] triggered by Bot. Commit: |
|
PR_Github #45725 [ run ] triggered by Bot. Commit: |
|
PR_Github #45724 [ run ] completed with state |
|
PR_Github #45726 [ kill ] triggered by Bot. Commit: |
|
PR_Github #45725 [ run ] completed with state |
|
PR_Github #45726 [ kill ] completed with state |
📝 WalkthroughWalkthroughThe PR modifies MLA generation parameter handling in the attention operation dispatcher, changing the KV-cache maximum sequence length source from Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128SeparateQkvDenseVarSeqSkipsSoftmaxQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp (1)
1-3: Exclude Git LFS pointer artifacts from C++ static-analysis/compile targets.This file is a valid Git LFS pointer, so clang parsing errors here are tooling noise and can hide real diagnostics in actual source files. Please filter
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/*_cubin.cpp(when unresolved as LFS pointers) from C++ analysis inputs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128SeparateQkvDenseVarSeqSkipsSoftmaxQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp` around lines 1 - 3, The file under cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/*_cubin.cpp is a Git LFS pointer (starts with "version https://git-lfs.github.com/spec/v1") and should be excluded from C++ parsing/compile analysis; modify the C++ analysis inputs (e.g., the build/CI config, clang-tidy/clangd file list, or the source discovery logic) to detect and skip files that begin with the LFS pointer header and/or match the glob cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/*_cubin.cpp when unresolved as LFS pointers so they are not passed to the compiler or static analyzers.cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128SeparateQkvDenseVarSeqSkipsSoftmaxQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp (1)
1-3: Consider excluding LFS cubin pointer files from C++ compilation/static-analysis inputs.The reported clang errors here are consistent with parsing Git LFS pointer text as C++; filtering this path pattern in clang-based jobs would reduce noise.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128SeparateQkvDenseVarSeqSkipsSoftmaxQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp` around lines 1 - 3, The CI/clang static-analysis is parsing Git LFS pointer files (e.g. the file named FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128SeparateQkvDenseVarSeqSkipsSoftmaxQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp) as C++ source; update the clang-based job or compile-command generator to filter out LFS pointer/cubin artifacts by skipping files that match the cubin pointer pattern (e.g. *.cubin.cpp or path/glob containing "trtllmGenKernels/*/cubin") or by detecting files whose first line starts with "version https://git-lfs.github.com/spec/v1" and excluding them from analysis/compilation inputs so clang doesn't attempt to parse pointer text as C++.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QE4m3KvE2m1OE4m3H128PagedKvDenseP32MultiCtasKvVarSeqQ8Kv128StaticSwapsAbForGen_cubin.cpp`:
- Line 2: CI workflows fail because Git LFS pointer files like the *.cubin.cpp
files are not being materialized before analysis; add an explicit LFS fetch step
(e.g., run "git lfs pull --all" or "git lfs fetch --all && git lfs checkout") at
the start of the jobs that run clang/clang-format/clang-tidy and compilation in
the pr-check.yml, precommit-check.yml and blossom-ci.yml workflows so the actual
binary files are present before any analysis/compile steps; place this step
immediately before the first analysis/compile action to ensure tools operate on
real files rather than LFS pointers.
In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QE4m3KvE2m1OE4m3H256PagedKvSlidingOrChunkedCausalP32VarSeqQ16Kv128PersistentSwapsAbForGen_cubin.cpp`:
- Around line 1-3: The build/lint pipeline is attempting to parse unresolved Git
LFS pointer files (e.g., the cubin artifact named
FmhaSm100aKernel_QE4m3KvE2m1OE4m3H256PagedKvSlidingOrChunkedCausalP32VarSeqQ16Kv128PersistentSwapsAbForGen_cubin.cpp)
as C++ source; update CI to run git lfs pull (or git lfs fetch && git lfs
checkout) before invoking the C++ parser/compiler, and/or exclude the cubin
artifacts directory/pattern from C++ analysis (add the cubin filename pattern or
directory to the linter/clang-tidy excludes in the build/lint job) so the
pointer files are not fed to the compiler.
In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128PackedQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp`:
- Around line 1-3: The CMake build is globbing all .cpp files (file(GLOB_RECURSE
SRC_CPP *.cpp)) and later adding them to the target, which accidentally includes
LFS pointer `_cubin.cpp` files; update
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/CMakeLists.txt to explicitly
filter out files matching "*_cubin.cpp" (or similar pattern) from the SRC_CPP
list before calling target_sources/target_link_libraries, e.g., add a remove or
list(FILTER) step after SRC_CPP is populated (and ensure this happens before
filter_source_cuda_architectures or any add_*_sources call) so that
functions/variables referenced like SRC_CPP and filter_source_cuda_architectures
are preserved but `_cubin.cpp` entries are excluded from compilation.
In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H64PackedQkvDenseVarSeqSkipsSoftmaxQ128Kv128StaticContext_cubin.cpp`:
- Line 2: The CI is compiling an LFS pointer file
(FmhaSm100aKernel_QkInt8VE4m3OBfloat16H64PackedQkvDenseVarSeqSkipsSoftmaxQ128Kv128StaticContext_cubin.cpp)
because Git LFS objects are not hydrated; add an explicit hydration step in the
pipeline before any compile/static-analysis stage (e.g., run git lfs pull ||
(git lfs fetch --all && git lfs checkout)) and fail early if hydration fails;
update the CI job(s) that run CMake/clang/compilation (the build stage that
references CMakeLists.txt) to execute this command and verify
.gitattributes-handled patterns are present so the actual .cubin.cpp sources are
present for the compiler.
---
Nitpick comments:
In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128SeparateQkvDenseVarSeqSkipsSoftmaxQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp`:
- Around line 1-3: The file under
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/*_cubin.cpp is a Git LFS
pointer (starts with "version https://git-lfs.github.com/spec/v1") and should be
excluded from C++ parsing/compile analysis; modify the C++ analysis inputs
(e.g., the build/CI config, clang-tidy/clangd file list, or the source discovery
logic) to detect and skip files that begin with the LFS pointer header and/or
match the glob cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/*_cubin.cpp
when unresolved as LFS pointers so they are not passed to the compiler or static
analyzers.
In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128SeparateQkvDenseVarSeqSkipsSoftmaxQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp`:
- Around line 1-3: The CI/clang static-analysis is parsing Git LFS pointer files
(e.g. the file named
FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128SeparateQkvDenseVarSeqSkipsSoftmaxQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp)
as C++ source; update the clang-based job or compile-command generator to filter
out LFS pointer/cubin artifacts by skipping files that match the cubin pointer
pattern (e.g. *.cubin.cpp or path/glob containing "trtllmGenKernels/*/cubin") or
by detecting files whose first line starts with "version
https://git-lfs.github.com/spec/v1" and excluding them from analysis/compilation
inputs so clang doesn't attempt to parse pointer text as C++.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| @@ -1,3 +1,3 @@ | |||
| version https://git-lfs.github.com/spec/v1 | |||
| oid sha256:331b82e248db2885154eb327f46161623026d4db48602b6e804b5112c8398dbd | |||
| oid sha256:9198f47281ac5310819d3796860f7d6c26c03213456fdba9f81ba307421c606e | |||
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "1) Confirm these files are LFS-tracked via attributes:"
fd '_cubin\.cpp$' cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin \
| xargs -I{} git check-attr -a "{}" | rg -n 'filter: lfs|diff: lfs|merge: lfs'
echo
echo "2) Confirm pointer format is present in working tree (indicates smudge may not have run):"
fd '_cubin\.cpp$' cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin \
| xargs -I{} sh -c 'head -n 2 "{}" | sed "s|^|{}: |"'
echo
echo "3) Inspect CI/workflow scripts for explicit git-lfs install/fetch/pull before analysis:"
rg -n -C2 'git lfs|lfs pull|lfs fetch|checkout.*lfs|actions/checkout' .githubRepository: NVIDIA/TensorRT-LLM
Length of output: 50377
🏁 Script executed:
fd -e yml -e yaml .github/workflows/ -type f 2>/dev/null \
| head -20 \
| xargs -I{} sh -c 'echo "=== {} ===" && cat "{}"' 2>/dev/nullRepository: NVIDIA/TensorRT-LLM
Length of output: 45
🏁 Script executed:
find . -maxdepth 3 -type d -name ".github" 2>/dev/nullRepository: NVIDIA/TensorRT-LLM
Length of output: 73
🏁 Script executed:
find .github -type f \( -name "*.yml" -o -name "*.yaml" \) 2>/dev/null | head -20Repository: NVIDIA/TensorRT-LLM
Length of output: 846
🏁 Script executed:
cat .github/workflows/blossom-ci.yml | head -150Repository: NVIDIA/TensorRT-LLM
Length of output: 3812
🏁 Script executed:
cat .github/workflows/pr-check.ymlRepository: NVIDIA/TensorRT-LLM
Length of output: 2719
🏁 Script executed:
cat .github/workflows/precommit-check.ymlRepository: NVIDIA/TensorRT-LLM
Length of output: 2136
🏁 Script executed:
cat .github/workflows/l0-test.yml | head -100Repository: NVIDIA/TensorRT-LLM
Length of output: 3171
🏁 Script executed:
rg -l "git lfs|lfs pull|lfs fetch|checkout.*lfs" .github/workflows/ 2>/dev/null || echo "No LFS configuration found in workflows"Repository: NVIDIA/TensorRT-LLM
Length of output: 103
🏁 Script executed:
cat .gitattributes | grep -A2 -B2 "cubin\|lfs" | head -30Repository: NVIDIA/TensorRT-LLM
Length of output: 1217
Add explicit LFS fetch step to CI workflows before analysis.
These *cubin.cpp files are Git LFS-tracked (confirmed in .gitattributes), but the file pointer (not the binary) is committed to the repository. Without an explicit git lfs pull or similar fetch step in CI workflows (pr-check.yml, precommit-check.yml, blossom-ci.yml), static analysis tools will attempt to parse the LFS pointer text as C++ code, causing clang, clang-format, and clang-tidy to fail with parse errors. Add LFS materialization before any analysis/compile steps in CI.
🧰 Tools
🪛 Clang (14.0.6)
[error] 2-2: unknown type name 'oid'; did you mean 'void'?
(clang-diagnostic-error)
[error] 2-2: variable has incomplete type 'void'
(clang-diagnostic-error)
[error] 2-2: expected ';' after top level declarator
(clang-diagnostic-error)
[error] 2-2: expected unqualified-id
(clang-diagnostic-error)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QE4m3KvE2m1OE4m3H128PagedKvDenseP32MultiCtasKvVarSeqQ8Kv128StaticSwapsAbForGen_cubin.cpp`
at line 2, CI workflows fail because Git LFS pointer files like the *.cubin.cpp
files are not being materialized before analysis; add an explicit LFS fetch step
(e.g., run "git lfs pull --all" or "git lfs fetch --all && git lfs checkout") at
the start of the jobs that run clang/clang-format/clang-tidy and compilation in
the pr-check.yml, precommit-check.yml and blossom-ci.yml workflows so the actual
binary files are present before any analysis/compile steps; place this step
immediately before the first analysis/compile action to ensure tools operate on
real files rather than LFS pointers.
| version https://git-lfs.github.com/spec/v1 | ||
| oid sha256:729458cc7556985c685fc24680021d4d6d386999684bcb0b18a65d47a1695d6e | ||
| oid sha256:eaa85c572e89a2a2f5aaa703533fd490fd556e2babfd5499809210ccfe663fce | ||
| size 579887 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Read-only check: detect unresolved LFS pointers among FMHA cubin .cpp artifacts.
fd '_cubin\.cpp$' cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin \
| xargs -I{} sh -c 'head -n 1 "{}" | grep -q "version https://git-lfs.github.com/spec/v1" && echo "UNRESOLVED_LFS: {}"'
# Expected:
# - No output in environments where LFS blobs are resolved for build/lint.
# - Any printed paths indicate unresolved pointers that should not be compiled as C++.Repository: NVIDIA/TensorRT-LLM
Length of output: 50376
Ensure Git LFS blobs are resolved before C++ compilation in build/lint jobs.
The cubin kernel directory contains ~500+ unresolved Git LFS pointers (files starting with version https://git-lfs.github.com/spec/v1). The toolchain appears to be attempting to parse these pointer stubs as C++ source, causing clang errors. Verify that build and lint jobs execute git lfs pull before invoking the C++ parser/compiler, or exclude these cubin artifacts from C++ analysis entirely.
🧰 Tools
🪛 Clang (14.0.6)
[error] 1-1: unknown type name 'version'
(clang-diagnostic-error)
[error] 1-1: expected ';' after top level declarator
(clang-diagnostic-error)
[error] 2-2: unknown type name 'oid'; did you mean 'void'?
(clang-diagnostic-error)
[error] 2-2: variable has incomplete type 'void'
(clang-diagnostic-error)
[error] 2-2: expected ';' after top level declarator
(clang-diagnostic-error)
[error] 2-2: unknown type name 'eaa85c572e89a2a2f5aaa703533fd490fd556e2babfd5499809210ccfe663fce'
(clang-diagnostic-error)
[error] 3-3: expected ';' after top level declarator
(clang-diagnostic-error)
🪛 Cppcheck (2.20.0)
[error] 3-3: syntax error
(syntaxError)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QE4m3KvE2m1OE4m3H256PagedKvSlidingOrChunkedCausalP32VarSeqQ16Kv128PersistentSwapsAbForGen_cubin.cpp`
around lines 1 - 3, The build/lint pipeline is attempting to parse unresolved
Git LFS pointer files (e.g., the cubin artifact named
FmhaSm100aKernel_QE4m3KvE2m1OE4m3H256PagedKvSlidingOrChunkedCausalP32VarSeqQ16Kv128PersistentSwapsAbForGen_cubin.cpp)
as C++ source; update CI to run git lfs pull (or git lfs fetch && git lfs
checkout) before invoking the C++ parser/compiler, and/or exclude the cubin
artifacts directory/pattern from C++ analysis (add the cubin filename pattern or
directory to the linter/clang-tidy excludes in the build/lint job) so the
pointer files are not fed to the compiler.
| version https://git-lfs.github.com/spec/v1 | ||
| oid sha256:04440fc4f0a05b4faa4dd444b156cd8eae913147f9a9aea451755f77d56dddbd | ||
| oid sha256:0d65300266e58bd37ed448107ece7d3df90d1ca643998708cab703b1bd114804 | ||
| size 759880 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify whether cubin LFS pointer files are referenced by build/lint configs.
set -euo pipefail
echo "== Locate build config files =="
fd -i '^(CMakeLists\.txt|.*\.cmake|BUILD|BUILD\.bazel|WORKSPACE|meson\.build)$'
echo
echo "== Search references to cubin pointer path/patterns =="
rg -n -C2 --iglob '*CMakeLists.txt' --iglob '*.cmake' --iglob 'BUILD*' --iglob 'WORKSPACE' --iglob 'meson.build' \
'trtllmGenKernels/fmha/cubin|_cubin\.cpp|FmhaSm100[a-zA-Z0-9_]*_cubin\.cpp'
echo
echo "== Search lint/static-analysis config references (if present) =="
rg -n -C2 --iglob '*clang*' --iglob '*lint*' --iglob '*.yml' --iglob '*.yaml' --iglob '*.json' \
'trtllmGenKernels/fmha/cubin|_cubin\.cpp|cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin'Repository: NVIDIA/TensorRT-LLM
Length of output: 6949
🏁 Script executed:
cat -n cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/CMakeLists.txtRepository: NVIDIA/TensorRT-LLM
Length of output: 4987
🏁 Script executed:
rg -B5 -A10 'filter_source_cuda_architectures' cpp/cmake/ --type cmakeRepository: NVIDIA/TensorRT-LLM
Length of output: 3981
🏁 Script executed:
rg -A50 'function(filter_source_cuda_architectures)' cpp/cmake/modules/cuda_configuration.cmakeRepository: NVIDIA/TensorRT-LLM
Length of output: 45
🏁 Script executed:
sed -n '/^function(filter_source_cuda_architectures)/,/^endfunction()/p' cpp/cmake/modules/cuda_configuration.cmake | head -80Repository: NVIDIA/TensorRT-LLM
Length of output: 3131
🏁 Script executed:
find cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin -type f -name '*.cpp' 2>/dev/null | head -20Repository: NVIDIA/TensorRT-LLM
Length of output: 3269
🏁 Script executed:
head -5 'cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QE4m3KvE2m1OE4m3H128PagedKvCausalP32VarSeqQ128Kv128PersistentContext_cubin.cpp'Repository: NVIDIA/TensorRT-LLM
Length of output: 194
🏁 Script executed:
git check-attr filter cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/*.cpp 2>/dev/null | head -20Repository: NVIDIA/TensorRT-LLM
Length of output: 3529
🏁 Script executed:
cat -n .gitattributes | grep -A5 -B5 'cubin\|lfs'Repository: NVIDIA/TensorRT-LLM
Length of output: 1329
🏁 Script executed:
# Check if CMakeLists handles LFS pointer files or if there's a .gitignore that might help
cat cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/.gitignore 2>/dev/null || echo "No .gitignore in fmha dir"
echo "---"
# Check the actual directory structure and how files are handled
ls -la cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/ | head -20Repository: NVIDIA/TensorRT-LLM
Length of output: 1113
🏁 Script executed:
# Check if there are any comments or special handling in CMakeLists.txt around the SRC_CPP glob
sed -n '45,65p' cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/CMakeLists.txtRepository: NVIDIA/TensorRT-LLM
Length of output: 893
🏁 Script executed:
# Check how CUDA compilation handles LFS pointer files - does it skip them or fail?
# Also verify if the architecture filter actually excludes these files
python3 << 'EOF'
import re
# Simulate the filter logic for architecture 100
arch = 100
match_pattern = f".*[Ss][Mm]_?{arch}.*(cubin\.cpp|\.cu)$"
exclude_pattern = f".*[Ss][Mm]_?{arch}f.*(cubin\.cpp|\.cu)$"
# Test against actual filenames from the cubin directory
test_files = [
"FmhaSm100aKernel_QE4m3KvE2m1OE4m3H128PagedKvCausalP32VarSeqQ128Kv128PersistentContext_cubin.cpp",
"fmhaRunner.cpp",
"fmhaReduction.cu",
]
print("Architecture 100 filtering:")
print(f" Match pattern: {match_pattern}")
print(f" Exclude pattern: {exclude_pattern}")
print()
for fname in test_files:
matches = re.match(match_pattern, fname)
excludes = re.match(exclude_pattern, fname) if exclude_pattern else False
status = "EXCLUDED" if matches and excludes else ("INCLUDED" if matches else "NOT MATCHED")
print(f" {fname}: {status}")
EOFRepository: NVIDIA/TensorRT-LLM
Length of output: 371
🏁 Script executed:
# Let me verify the full filter function to see if there's additional logic I missed
sed -n '/function(filter_source_cuda_architectures)/,/^endfunction()/p' cpp/cmake/modules/cuda_configuration.cmake | tail -50Repository: NVIDIA/TensorRT-LLM
Length of output: 1672
🏁 Script executed:
# Check if there's any special handling in CMake for LFS pointer files
# Look for how add_library handles them or if there's language detection
rg -n 'LANGUAGE|LINKER_LANGUAGE|cubin' cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/CMakeLists.txtRepository: NVIDIA/TensorRT-LLM
Length of output: 45
🏁 Script executed:
# Check if CMake has any special handling for LFS files or language detection
# Also check if there's any build documentation about cubin files
find cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha -name '*.md' -o -name 'README*' 2>/dev/null
echo "---"
# Check if there are any comments in the CMakeLists about cubin handling
grep -n -i 'cubin\|lfs\|binary' cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/CMakeLists.txtRepository: NVIDIA/TensorRT-LLM
Length of output: 655
🏁 Script executed:
# Check if there's any custom CMake rule or property that marks cubin files specially
# Also look for how the kernels are actually loaded at runtime
rg -n 'cubin\|kernel.*load\|dlopen\|dlsym' cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaRunner.cpp | head -30Repository: NVIDIA/TensorRT-LLM
Length of output: 45
🏁 Script executed:
# Check if there's a .gitattributes diff/merge/clean filter that might prevent LFS pointers from being compiled
cat .gitattributes | head -20
echo "---"
# Check if there's any documentation about how the build handles LFS files
rg -n 'git.*lfs\|lfs.*filter' . --type md --type txt 2>/dev/null | head -20Repository: NVIDIA/TensorRT-LLM
Length of output: 1221
Ensure _cubin.cpp pointer files are explicitly excluded from C++ compilation targets.
The CMakeLists.txt at line 49 uses file(GLOB_RECURSE SRC_CPP *.cpp) to collect all .cpp files, which includes LFS pointer files. Line 60 adds all collected files to the compilation target. When LFS is not pulled (e.g., shallow clones or CI without git-lfs), these pointer files contain plaintext like version https://git-lfs.github.com/spec/v1... and will cause hard C++ parser failures during compilation. While .pre-commit-config.yaml correctly excludes cubin.cpp from linting, the build system lacks explicit exclusion from compilation. The filter_source_cuda_architectures function filters only by SM architecture patterns, not by file type or LFS status. Add an explicit filter in cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/CMakeLists.txt to remove *_cubin.cpp files before adding sources to the compilation target.
🧰 Tools
🪛 Clang (14.0.6)
[error] 1-1: unknown type name 'version'
(clang-diagnostic-error)
[error] 1-1: expected ';' after top level declarator
(clang-diagnostic-error)
[error] 2-2: unknown type name 'oid'; did you mean 'void'?
(clang-diagnostic-error)
[error] 2-2: variable has incomplete type 'void'
(clang-diagnostic-error)
[error] 2-2: expected ';' after top level declarator
(clang-diagnostic-error)
[error] 2-2: expected unqualified-id
(clang-diagnostic-error)
🪛 Cppcheck (2.20.0)
[error] 3-3: syntax error
(syntaxError)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H128PackedQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp`
around lines 1 - 3, The CMake build is globbing all .cpp files
(file(GLOB_RECURSE SRC_CPP *.cpp)) and later adding them to the target, which
accidentally includes LFS pointer `_cubin.cpp` files; update
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/CMakeLists.txt to explicitly
filter out files matching "*_cubin.cpp" (or similar pattern) from the SRC_CPP
list before calling target_sources/target_link_libraries, e.g., add a remove or
list(FILTER) step after SRC_CPP is populated (and ensure this happens before
filter_source_cuda_architectures or any add_*_sources call) so that
functions/variables referenced like SRC_CPP and filter_source_cuda_architectures
are preserved but `_cubin.cpp` entries are excluded from compilation.
| @@ -1,3 +1,3 @@ | |||
| version https://git-lfs.github.com/spec/v1 | |||
| oid sha256:024b568fbadb7ed013f975f8b95fd3b81c58aafb9194005bb6c64badfa889108 | |||
| oid sha256:c45588917e7f4d80700d85b67be5e8030520ac450dec3d44a6d979c6eea4abe5 | |||
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "1) Check LFS tracking for these cubin .cpp files"
rg -n --hidden --glob '.gitattributes' 'trtllmGenKernels/fmha/cubin/.*\.cpp|filter=lfs' -C2
echo
echo "2) Check CI/scripts for explicit LFS hydration"
rg -n --hidden -S 'git lfs (install|fetch|pull|checkout)' -C2
echo
echo "3) Spot pointer placeholders currently present in repo checkout"
rg -n --hidden -S '^version https://git-lfs.github.com/spec/v1$|^oid sha256:' cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin -C1Repository: NVIDIA/TensorRT-LLM
Length of output: 50379
Ensure Git LFS objects are hydrated before C++ compilation in CI.
Line 2 of this file contains a Git LFS pointer record, not actual source code. The .gitattributes file confirms *cubin.cpp filter=lfs diff=lfs merge=lfs -text, and verification shows these files currently exist as pointers in the repository. If clang or any C++ compiler attempts to parse this LFS pointer (version https://git-lfs.github.com/spec/v1 followed by oid sha256:...) as source, it will fail with parsing errors.
CI must explicitly run git lfs pull (or equivalent hydration) before any compilation or static analysis stage that touches these files. While documentation and CMakeLists.txt reference LFS handling, there should be an enforced, mandatory hydration step in the build pipeline—not just an error message after-the-fact.
🧰 Tools
🪛 Clang (14.0.6)
[error] 2-2: unknown type name 'oid'; did you mean 'void'?
(clang-diagnostic-error)
[error] 2-2: variable has incomplete type 'void'
(clang-diagnostic-error)
[error] 2-2: expected ';' after top level declarator
(clang-diagnostic-error)
[error] 2-2: unknown type name 'c45588917e7f4d80700d85b67be5e8030520ac450dec3d44a6d979c6eea4abe5'
(clang-diagnostic-error)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16H64PackedQkvDenseVarSeqSkipsSoftmaxQ128Kv128StaticContext_cubin.cpp`
at line 2, The CI is compiling an LFS pointer file
(FmhaSm100aKernel_QkInt8VE4m3OBfloat16H64PackedQkvDenseVarSeqSkipsSoftmaxQ128Kv128StaticContext_cubin.cpp)
because Git LFS objects are not hydrated; add an explicit hydration step in the
pipeline before any compile/static-analysis stage (e.g., run git lfs pull ||
(git lfs fetch --all && git lfs checkout)) and fail early if hydration fails;
update the CI job(s) that run CMake/clang/compilation (the build stage that
references CMakeLists.txt) to execute this command and verify
.gitattributes-handled patterns are present so the actual .cubin.cpp sources are
present for the compiler.
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2567c94 to
c977b5d
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #45754 [ run ] triggered by Bot. Commit: |
Summary by CodeRabbit
Release Notes
Bug Fixes
Performance
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.