Added planar types to speed up complex half precision GEMMs by cliffburdick · Pull Request #1142 · NVIDIA/MatX

cliffburdick · 2026-03-19T20:08:30Z

No description provided.

copy-pr-bot · 2026-03-19T20:08:34Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-03-19T20:15:43Z

Greptile Summary

This PR introduces matxFp16ComplexPlanar and matxBf16ComplexPlanar tag types that allow callers to express at the type level that a complex tensor uses split real/imaginary planes ([real₀…real_{n-1}][imag₀…imag_{n-1}]) rather than interleaved storage. The primary motivation is avoiding the per-call interleaved→planar→interleaved conversion overhead in complex-half GEMM paths: when the user already owns a planar-layout buffer, the conversion steps are simply skipped.

Key changes:

New planar types (half_complex.h, type_utils_both.h, type_utils.h): thin tag structs inheriting from the interleaved complex types; propagated through all existing type-trait machinery.
tensor_impl.h: PlanarComplexProxy return type from mutable operator() for planar tensors, with LoadPlanarComplex/StorePlanarComplex helpers using TotalSize() as the plane offset (safe because contiguity is enforced at construction time).
tensor.h: ValidatePlanarLayoutOnCreate_ asserts unit innermost stride and contiguity for every planar tensor at construction/reset, closing the non-contiguous view loophole flagged in the prior review round.
set.h: scalar EPT is gated on is_planar_complex_v<T::value_type>, preserving vectorized EPT for all non-planar SetOp assignments.
matmul_cuda.h: each of A, B, C is individually tested for planarity; already-planar inputs skip the interleaved→planar conversion step, and ldc is forced to c.Size(RANK-1) for all complex-half C outputs.
sparse2dense_cusparse.h / matmul_cusparse.h: hard stride assertions replaced with contiguous-temporary fallback paths.
fft_fftw.h: num_threads added to the FFTW plan cache key, fixing a latent cache-collision bug.

Confidence Score: 4/5

Core planar GEMM and tensor machinery is correct; two P1 issues in the cuSPARSE/sparse-matmul fallback paths need review before merge.

The three previously-flagged critical issues (SetOp EPT regression, TotalSize non-contiguous offset, c_adj ldc mismatch) are all resolved. Two new P1 findings remain in the cuSPARSE fallbacks: the isSameView write-back guards are dead code that obscure intent. ReshapeOp unconditional scalar EPT is a P2 performance concern. All other findings are style/cleanup.

include/matx/transforms/convert/sparse2dense_cusparse.h and include/matx/transforms/matmul/matmul_cusparse.h (dead write-back guards); include/matx/operators/reshape.h (unconditional scalar EPT).

Important Files Changed

Filename	Overview
include/matx/core/half_complex.h	Adds `matxFp16ComplexPlanar` and `matxBf16ComplexPlanar` tag types that inherit from the interleaved counterparts; clean, minimal change.
include/matx/core/type_utils_both.h	Adds planar types to all relevant type-trait concepts/variables and introduces `is_planar_complex_v`.
include/matx/core/tensor_impl.h	Adds PlanarComplexProxy for non-addressable planar memory, LoadPlanarComplex/StorePlanarComplex helpers, and routes operator() through them; proxy lifetime and EPT forcing look correct but proxy return-type changes are subtle.
include/matx/core/tensor.h	Adds `ValidatePlanarLayoutOnCreate_` to all constructors and Reset overloads to enforce contiguous unit-stride constraint for planar types at construction time.
include/matx/operators/set.h	Gates scalar EPT on planar-complex output type only, preserving vectorized EPT for non-planar SetOp; addresses the previous regression comment.
include/matx/transforms/matmul/matmul_cuda.h	Skips interleaved-to-planar conversion for already-planar inputs/outputs; updates ldc to `c.Size(RANK-1)` for complex-half; addresses previous c_adj pointer mismatch comment.
include/matx/operators/reshape.h	Forces scalar EPT for all ReshapeOp instances unconditionally (not scoped to planar types); also adds initializer-list constraint to prevent ambiguous overload resolution.
include/matx/transforms/fft/fft_fftw.h	Adds `num_threads` to FFTW params struct, hash, and equality check — fixes a latent cache key collision when the same FFT shape is called with different thread counts.
include/matx/transforms/convert/sparse2dense_cusparse.h	Replaces hard MATX_ASSERT with a contiguous-temporary fallback; the write-back guard `!o.isSameView(O)` is dead code in practice.
include/matx/transforms/matmul/matmul_cusparse.h	Similarly replaces a hard stride assertion with a contiguous-temporary fallback; mirrors the sparse2dense approach with the same dead write-back guard concern.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["User calls matmul(a, b) → c\nwhere types are complex half"] --> B{is_complex_half_v?}
    B -- No --> Z["Normal GEMM path"]
    B -- Yes --> C{a_is_planar?}
    C -- No --> D["Alloc a_hp\nplanar(a) → a_planar\na_adj.Reset(a_planar.Data())"]
    C -- Yes --> E["a_adj unchanged\n(already planar layout)"]
    D --> F{b_is_planar?}
    E --> F
    F -- No --> G["Alloc b_hp\nplanar(b) → b_planar\nb_adj.Reset(b_planar.Data())"]
    F -- Yes --> H["b_adj unchanged"]
    G --> I{c_is_planar?}
    H --> I
    I -- No --> J["Alloc c_hp\nc_adj.Reset(c_planar.Data())"]
    I -- Yes --> K["c_adj.Reset(c.Data())\n(no-op, already correct)"]
    J --> L["cuBLASLt / cuBLAS GEMM\nusing a_adj, b_adj, c_adj\nparams.ldc = c.Size(RANK-1)"]
    K --> L
    L --> M{c_is_planar?}
    M -- No --> N["interleaved(c_planar) → c\n(convert back to user buffer)"]
    M -- Yes --> O["c already holds planar result\nno conversion needed"]

_{Reviews (5): Last reviewed commit: "Fix failing sparse and reshape unit test..." | Re-trigger Greptile}

include/matx/operators/set.h

include/matx/core/tensor_impl.h

include/matx/transforms/matmul/matmul_cuda.h

cliffburdick · 2026-03-19T21:04:14Z

/build

cliffburdick · 2026-03-20T15:41:57Z

/build

cliffburdick · 2026-03-20T21:05:22Z

/build

cliffburdick · 2026-04-03T16:16:17Z

/build

cliffburdick added 2 commits March 19, 2026 13:04

Added planar types to speed up complex half precision GEMMs

33ec90f

Cleanup

2507608

greptile-apps bot reviewed Mar 19, 2026

View reviewed changes

include/matx/operators/set.h Show resolved Hide resolved

include/matx/core/tensor_impl.h Show resolved Hide resolved

include/matx/transforms/matmul/matmul_cuda.h Show resolved Hide resolved

cliffburdick added 2 commits March 19, 2026 13:29

Code review updates

c47a6cc

Code review updates

59d5320

Compilation error

de287c9

Fix failing sparse and reshape unit tests

4da48da

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added planar types to speed up complex half precision GEMMs#1142

Added planar types to speed up complex half precision GEMMs#1142
cliffburdick wants to merge 6 commits intomainfrom
planar_tensor

cliffburdick commented Mar 19, 2026

Uh oh!

copy-pr-bot bot commented Mar 19, 2026

Uh oh!

greptile-apps bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cliffburdick commented Mar 19, 2026

Uh oh!

cliffburdick commented Mar 20, 2026

Uh oh!

cliffburdick commented Mar 20, 2026

Uh oh!

cliffburdick commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cliffburdick commented Mar 19, 2026

Uh oh!

copy-pr-bot bot commented Mar 19, 2026

Uh oh!

greptile-apps bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cliffburdick commented Mar 19, 2026

Uh oh!

cliffburdick commented Mar 20, 2026

Uh oh!

cliffburdick commented Mar 20, 2026

Uh oh!

cliffburdick commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps bot commented Mar 19, 2026 •

edited

Loading