Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

[Common] Enable NVFP4 2D block scaling in columnwise only
#3027 opened May 21, 2026 by negvet Collaborator Loading…
1 of 13 tasks
tests/attention: shrink fp8_vs_f16 configs from B=2 to B=1
#3020 opened May 21, 2026 by vedaanta Loading…
13 tasks
fix(grouped_linear): handle all-zero-token forward and backward community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3019 opened May 21, 2026 by jubick1337 Loading…
13 tasks
[Common] Fix fused MoE aux loss for sequence aux loss
#3018 opened May 21, 2026 by harryzhou2000 Member Loading…
[Common] NVTETensor peer-handle annotation + nccl_comm backend
#3017 opened May 20, 2026 by phu0ngng Collaborator Loading…
7 of 13 tasks
Update cudnn-frontend to 1.24.0
#3016 opened May 20, 2026 by sudhakarsingh27 Collaborator Loading…
5 of 13 tasks
Add the getter and setter of skip_fp8_weight_update_tensor
#3015 opened May 20, 2026 by xrennvidia Collaborator Loading…
6 of 13 tasks
adding NVIDIA_TF32_OVERRIDE=0 to test_numerics.py
#3014 opened May 20, 2026 by francesco-bertolotti Contributor Loading…
[Common] Optimize fused router forward/backward kernels
#3012 opened May 19, 2026 by harryzhou2000 Member Loading…
[PyTorch] NVFP4 RHT cast-fusion: emit GEMM-swizzled scale factors directly
#3011 opened May 19, 2026 by cael-ling Contributor Loading…
8 of 13 tasks
Bitmap topk
#3009 opened May 18, 2026 by tdophung Collaborator Loading…
13 tasks
Generalized Tensor Parallelism (GTP) org-contribution
#3005 opened May 18, 2026 by fanshiqing Member Loading…
6 of 13 tasks
Add wheel support for Newton-Schulz method via cuSolverMp
#3004 opened May 17, 2026 by ksivaman Member Loading…
6 of 13 tasks
Optimize function that loads pointers on GPU cpu_overhead refactor
#3001 opened May 16, 2026 by timmoon10 Collaborator Loading…
8 of 14 tasks
TritonKernelCall: CUDA graph compatibility
#3000 opened May 15, 2026 by tdophung Collaborator Loading…
6 of 13 tasks
Plumb FP8+THD
#2994 opened May 14, 2026 by sudhakarsingh27 Collaborator Loading…
13 tasks
CP Tests batching using subprocess worker pool 2.16.0 community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2993 opened May 14, 2026 by sudhakarsingh27 Collaborator Loading…
8 of 9 tasks
Improve TE Group MLP CPU Overhead cpu_overhead
#2991 opened May 14, 2026 by zhongbozhu Collaborator Loading…
13 tasks
Add codex/agents to .gitignore org-contribution
#2990 opened May 14, 2026 by yaox12 Member Loading…
13 tasks
[JAX] Support for cuDNN-backed flex attention 2.16.0 community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2985 opened May 13, 2026 by vcherepanov-nv Collaborator Loading…
4 of 13 tasks
[PyTorch] Support for cuDNN-backed flex attention 2.16.0 community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2984 opened May 13, 2026 by vcherepanov-nv Collaborator Loading…
4 of 13 tasks
GGEMM+srelu kernels for MxFP8 Nemotron community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2981 opened May 12, 2026 by sraman-rgb Loading…
8 of 13 tasks
[Common, PyTorch] Improve mHC to match DeepSeek's implementation
#2978 opened May 12, 2026 by kainzhong Collaborator Loading…
9 of 13 tasks
ProTip! no:milestone will show everything without a milestone.