Added planar types to speed up complex half precision GEMMs#1142
Added planar types to speed up complex half precision GEMMs#1142cliffburdick wants to merge 6 commits intomainfrom
Conversation
Greptile SummaryThis PR introduces Key changes:
Confidence Score: 4/5Core planar GEMM and tensor machinery is correct; two P1 issues in the cuSPARSE/sparse-matmul fallback paths need review before merge. The three previously-flagged critical issues (SetOp EPT regression, TotalSize non-contiguous offset, c_adj ldc mismatch) are all resolved. Two new P1 findings remain in the cuSPARSE fallbacks: the isSameView write-back guards are dead code that obscure intent. ReshapeOp unconditional scalar EPT is a P2 performance concern. All other findings are style/cleanup.
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["User calls matmul(a, b) → c\nwhere types are complex half"] --> B{is_complex_half_v?}
B -- No --> Z["Normal GEMM path"]
B -- Yes --> C{a_is_planar?}
C -- No --> D["Alloc a_hp\nplanar(a) → a_planar\na_adj.Reset(a_planar.Data())"]
C -- Yes --> E["a_adj unchanged\n(already planar layout)"]
D --> F{b_is_planar?}
E --> F
F -- No --> G["Alloc b_hp\nplanar(b) → b_planar\nb_adj.Reset(b_planar.Data())"]
F -- Yes --> H["b_adj unchanged"]
G --> I{c_is_planar?}
H --> I
I -- No --> J["Alloc c_hp\nc_adj.Reset(c_planar.Data())"]
I -- Yes --> K["c_adj.Reset(c.Data())\n(no-op, already correct)"]
J --> L["cuBLASLt / cuBLAS GEMM\nusing a_adj, b_adj, c_adj\nparams.ldc = c.Size(RANK-1)"]
K --> L
L --> M{c_is_planar?}
M -- No --> N["interleaved(c_planar) → c\n(convert back to user buffer)"]
M -- Yes --> O["c already holds planar result\nno conversion needed"]
Reviews (5): Last reviewed commit: "Fix failing sparse and reshape unit test..." | Re-trigger Greptile |
|
/build |
1 similar comment
|
/build |
|
/build |
|
/build |
No description provided.