Commit 744e407
Fix uncoalesced global memory access in decode attention bf16 kernel (#5109)
Summary:
Pull Request resolved: #5109
X-link: https://github.com/facebookresearch/FBGEMM/pull/2114
Issue reported in ncu profile
{F1983281351}
Reviewed By: Aya-ZIbra
Differential Revision: D85631783
fbshipit-source-id: 563a81df3bbc02109b466f3b3e83c35abbfab76f1 parent a0c6f1a commit 744e407
File tree
1 file changed
+6
-2
lines changed- fbgemm_gpu/experimental/gen_ai/src/attention/cuda/cutlass_blackwell_fmha/collective
1 file changed
+6
-2
lines changedLines changed: 6 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
171 | 171 | | |
172 | 172 | | |
173 | 173 | | |
174 | | - | |
175 | | - | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
176 | 180 | | |
177 | 181 | | |
178 | 182 | | |
| |||
0 commit comments