GGEMM+srelu kernels for MxFP8 Nemotron#2981
Conversation
|
/te-ci pytorch |
|
Please sign-off your commits @sraman-rgb |
Greptile SummaryThis PR refactors the MXFP8 fused grouped-MLP kernel infrastructure to support both GLU-style activations (SwiGLU, QGeGLU) and unary activations (SReLU), then adds the new
Confidence Score: 4/5The new SReLU fused kernel path is logically sound but carries unresolved concerns from prior review rounds in the fused forward and backward files. The base-class refactoring is clean and ScaledSReLU is correctly wired through both unfused and fused paths. The activation_recompute_in_mlp mechanism is self-consistent. The unreachable grad_scales guard is the main new finding — it cannot cause wrong results today but would silently discard the scale gradient if the kernel interface changed. Three prior-thread concerns remain open. transformer_engine/pytorch/ops/fused/backward_grouped_mlp.py and forward_grouped_mlp.py deserve a closer look before merging. Important Files Changed
Reviews (12): Last reviewed commit: "Merge branch 'main' into fc1-srelu-main" | Re-trigger Greptile |
8373402 to
765d2e9
Compare
Signed-off-by: sraman-rgb <sraman@nvidia.com>
765d2e9 to
43093cc
Compare
timmoon10
left a comment
There was a problem hiding this comment.
Overall looks good, but we've gotten to the point where we need to start thinking about how to gracefully handle adding new activations. It seems that every model has a different activation function.
|
Want your agent to iterate on Greptile's feedback? Try greploops. |
Signed-off-by: Siddhartha Raman S <sraman@login-lyris01.lyris.clusters.nvidia.com>
Signed-off-by: Siddhartha Raman S <sraman@nvidia.com>
Signed-off-by: Siddhartha Raman S <sraman@nvidia.com>
for more information, see https://pre-commit.ci
Signed-off-by: Siddhartha Raman S <sraman@nvidia.com>
912b1d9 to
46b3169
Compare
vthumbe1503
left a comment
There was a problem hiding this comment.
LGTM. We might want to wait on the cudnn release and apt cudnn guards are added.
Signed-off-by: Siddhartha Raman S <sraman@login-lyris01.lyris.clusters.nvidia.com>
Signed-off-by: Siddhartha Raman S <sraman@login-lyris01.lyris.clusters.nvidia.com>
Signed-off-by: Siddhartha Raman S <sraman@login-lyris01.lyris.clusters.nvidia.com>
Signed-off-by: Siddhartha Raman S <sraman@login-lyris01.lyris.clusters.nvidia.com>
Signed-off-by: Siddhartha Raman S <sraman@login-lyris01.lyris.clusters.nvidia.com>
|
/te-ci pytorch |
Description
Please include a brief summary of the changes, relevant motivation and context.
Fixes # (issue)
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: