-
Notifications
You must be signed in to change notification settings - Fork 725
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Common] Enable NVFP4 2D block scaling in columnwise only
#3027
opened May 21, 2026 by
negvet
Collaborator
Loading…
1 of 13 tasks
tests/attention: shrink fp8_vs_f16 configs from B=2 to B=1
#3020
opened May 21, 2026 by
vedaanta
Loading…
13 tasks
fix(grouped_linear): handle all-zero-token forward and backward
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3019
opened May 21, 2026 by
jubick1337
Loading…
13 tasks
[Common] Fix fused MoE aux loss for sequence aux loss
#3018
opened May 21, 2026 by
harryzhou2000
Member
Loading…
[Common] NVTETensor peer-handle annotation + nccl_comm backend
#3017
opened May 20, 2026 by
phu0ngng
Collaborator
Loading…
7 of 13 tasks
Update cudnn-frontend to 1.24.0
#3016
opened May 20, 2026 by
sudhakarsingh27
Collaborator
Loading…
5 of 13 tasks
Add the getter and setter of skip_fp8_weight_update_tensor
#3015
opened May 20, 2026 by
xrennvidia
Collaborator
Loading…
6 of 13 tasks
adding NVIDIA_TF32_OVERRIDE=0 to test_numerics.py
#3014
opened May 20, 2026 by
francesco-bertolotti
Contributor
Loading…
[Common] Optimize fused router forward/backward kernels
#3012
opened May 19, 2026 by
harryzhou2000
Member
Loading…
[PyTorch] NVFP4 RHT cast-fusion: emit GEMM-swizzled scale factors directly
#3011
opened May 19, 2026 by
cael-ling
Contributor
Loading…
8 of 13 tasks
Add MXFP8 quantized_model_init memory profiler for FSDP2 qinit analysis
#3008
opened May 18, 2026 by
savitha-eng
•
Draft
1 task done
Generalized Tensor Parallelism (GTP)
org-contribution
#3005
opened May 18, 2026 by
fanshiqing
Member
Loading…
6 of 13 tasks
Add wheel support for Newton-Schulz method via cuSolverMp
#3004
opened May 17, 2026 by
ksivaman
Member
Loading…
6 of 13 tasks
Optimize function that loads pointers on GPU
cpu_overhead
refactor
#3001
opened May 16, 2026 by
timmoon10
Collaborator
Loading…
8 of 14 tasks
TritonKernelCall: CUDA graph compatibility
#3000
opened May 15, 2026 by
tdophung
Collaborator
Loading…
6 of 13 tasks
CP Tests batching using subprocess worker pool
2.16.0
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#2993
opened May 14, 2026 by
sudhakarsingh27
Collaborator
Loading…
8 of 9 tasks
Improve TE Group MLP CPU Overhead
cpu_overhead
#2991
opened May 14, 2026 by
zhongbozhu
Collaborator
Loading…
13 tasks
Add codex/agents to .gitignore
org-contribution
#2990
opened May 14, 2026 by
yaox12
Member
Loading…
13 tasks
[JAX] Support for cuDNN-backed flex attention
2.16.0
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#2985
opened May 13, 2026 by
vcherepanov-nv
Collaborator
Loading…
4 of 13 tasks
[PyTorch] Support for cuDNN-backed flex attention
2.16.0
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#2984
opened May 13, 2026 by
vcherepanov-nv
Collaborator
Loading…
4 of 13 tasks
GGEMM+srelu kernels for MxFP8 Nemotron
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#2981
opened May 12, 2026 by
sraman-rgb
Loading…
8 of 13 tasks
[Common, PyTorch] Improve mHC to match DeepSeek's implementation
#2978
opened May 12, 2026 by
kainzhong
Collaborator
Loading…
9 of 13 tasks
Previous Next
ProTip!
no:milestone will show everything without a milestone.