Skip to content

adding NVIDIA_TF32_OVERRIDE=0 to test_numerics.py#3014

Open
francesco-bertolotti wants to merge 1 commit into
NVIDIA:mainfrom
francesco-bertolotti:f14-tf32override
Open

adding NVIDIA_TF32_OVERRIDE=0 to test_numerics.py#3014
francesco-bertolotti wants to merge 1 commit into
NVIDIA:mainfrom
francesco-bertolotti:f14-tf32override

Conversation

@francesco-bertolotti
Copy link
Copy Markdown
Contributor

PR splitted from #3013

I have added NVIDIA_TF32_OVERRIDE=0 to test_numerics.py otherwise I would get test failing for small numerical mismatch with layer norms. This has also been done for test_mhc.py.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 20, 2026

Greptile Summary

This PR adds NVIDIA_TF32_OVERRIDE=0 to the test_numerics.py invocation in test.sh, mirroring the same environment variable already set for test_mhc.py. The intent is to force full FP32 precision on Ampere and later GPUs, preventing TF32-induced numerical mismatches that were causing intermittent failures in layer norm tests.

  • NVIDIA_TF32_OVERRIDE=0 is appended to the existing set of determinism flags (PYTORCH_JIT=0, NVTE_TORCH_COMPILE=0, NVTE_ALLOW_NONDETERMINISTIC_ALGO=0, NVTE_FUSED_ATTN=0) already used for test_numerics.py.
  • The sibling test test_mhc.py (line 63) already uses the same flag and includes an inline comment explaining the rationale; no such comment was added alongside the new change.

Confidence Score: 4/5

Safe to merge — the change is a one-line addition of a well-understood environment variable that forces FP32 precision, consistent with how other numerics-sensitive tests in the same script are already configured.

The change is minimal and follows an established pattern in the file. The only gap is a missing inline comment explaining the rationale, which the test_mhc.py line directly below already has. No functional risk is introduced.

No files require special attention. test_cuda_graphs.py shares the same determinism flags but does not get NVIDIA_TF32_OVERRIDE=0; this may be intentional but is worth a quick sanity check.

Important Files Changed

Filename Overview
qa/L0_pytorch_unittest/test.sh Adds NVIDIA_TF32_OVERRIDE=0 to the test_numerics.py invocation to prevent TF32-induced numerical mismatches in layer norm tests; mirrors the same pattern already used for test_mhc.py but lacks the inline comment that explains the rationale there.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[test.sh] --> B[test_numerics.py NVIDIA_TF32_OVERRIDE=0 NEW]
    A --> C[test_cuda_graphs.py no NVIDIA_TF32_OVERRIDE]
    A --> D[test_mhc.py NVIDIA_TF32_OVERRIDE=0 + inline comment]
    B -->|disables TF32 on Ampere+| E[full FP32 precision]
    D -->|same effect| E
Loading

Reviews (1): Last reviewed commit: "adding NVIDIA_TF32_OVERRIDE=0 to test_nu..." | Re-trigger Greptile

Comment thread qa/L0_pytorch_unittest/test.sh
@francesco-bertolotti
Copy link
Copy Markdown
Contributor Author

I do not know if it helps, these are the tests failing without the tf32 env flag:

FAILED tests/pytorch/test_numerics.py::test_linear_accuracy[True-True-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=0. Maximum difference at location [50] with -3.5490684509277344 vs -3.555588245391...
FAILED tests/pytorch/test_numerics.py::test_linear_accuracy[True-True-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=0. Maximum difference at location [61] with 2.6603221893310547 vs 2.65149784088134...
FAILED tests/pytorch/test_numerics.py::test_linear_accuracy[True-False-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=0. Maximum difference at location [50] with -3.5490684509277344 vs -3.555588245391...
FAILED tests/pytorch/test_numerics.py::test_linear_accuracy[True-False-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=0. Maximum difference at location [61] with 2.6603221893310547 vs 2.65149784088134...
FAILED tests/pytorch/test_numerics.py::test_linear_accuracy[False-True-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=0. Maximum difference at location [50] with -3.5490684509277344 vs -3.555588245391...
FAILED tests/pytorch/test_numerics.py::test_linear_accuracy[False-True-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=0. Maximum difference at location [61] with 2.6603221893310547 vs 2.65149784088134...
FAILED tests/pytorch/test_numerics.py::test_linear_accuracy[False-False-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=0. Maximum difference at location [50] with -3.5490684509277344 vs -3.555588245391...
FAILED tests/pytorch/test_numerics.py::test_linear_accuracy[False-False-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=0. Maximum difference at location [61] with 2.6603221893310547 vs 2.65149784088134...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-True-True-LayerNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=113. Maximum difference at location [0, 165] with 0.04553138092160225 vs 0.0458293...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-True-True-LayerNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=11. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-True-True-RMSNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=56. Maximum difference at location [0, 209] with 0.0030347853899002075 vs 0.003319...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-True-True-RMSNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=28. Maximum difference at location [1, 421] with -0.017969021573662758 vs -0.01827...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-True-False-LayerNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=22. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-True-False-LayerNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=11. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-True-False-RMSNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=56. Maximum difference at location [0, 209] with 0.0030347853899002075 vs 0.003319...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-True-False-RMSNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=28. Maximum difference at location [1, 421] with -0.017969021573662758 vs -0.01827...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-False-True-LayerNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=22. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-False-True-LayerNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=11. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-False-True-RMSNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=56. Maximum difference at location [0, 209] with 0.0030347853899002075 vs 0.003319...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-False-True-RMSNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=28. Maximum difference at location [1, 421] with -0.017969021573662758 vs -0.01827...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-False-False-LayerNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=22. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-False-False-LayerNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=11. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-False-False-RMSNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=56. Maximum difference at location [0, 209] with 0.0030347853899002075 vs 0.003319...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-False-False-RMSNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=28. Maximum difference at location [1, 421] with -0.017969021573662758 vs -0.01827...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-True-True-LayerNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=22. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-True-True-LayerNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=11. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-True-True-RMSNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=56. Maximum difference at location [0, 209] with 0.0030347853899002075 vs 0.003319...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-True-True-RMSNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=28. Maximum difference at location [1, 421] with -0.017969021573662758 vs -0.01827...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-True-False-LayerNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=22. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-True-False-LayerNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=11. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-True-False-RMSNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=56. Maximum difference at location [0, 209] with 0.0030347853899002075 vs 0.003319...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-True-False-RMSNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=28. Maximum difference at location [1, 421] with -0.017969021573662758 vs -0.01827...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-False-True-LayerNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=22. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-False-True-LayerNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=11. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-False-True-RMSNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=56. Maximum difference at location [0, 209] with 0.0030347853899002075 vs 0.003319...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-False-True-RMSNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=28. Maximum difference at location [1, 421] with -0.017969021573662758 vs -0.01827...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-False-False-LayerNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=22. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-False-False-LayerNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=11. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-False-False-RMSNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=56. Maximum difference at location [0, 209] with 0.0030347853899002075 vs 0.003319...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-False-False-RMSNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=28. Maximum difference at location [1, 421] with -0.017969021573662758 vs -0.01827...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-True-LayerNorm-relu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=14. Maximum difference at location [0, 99] with -0.07819076627492905 vs -0.1000889...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-True-LayerNorm-relu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=7. Maximum difference at location [0, 99] with -0.07819075882434845 vs -0.10008895...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-True-LayerNorm-reglu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=105. Maximum difference at location [75] with -0.04723335802555084 vs -0.014243453...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-True-LayerNorm-reglu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=22. Maximum difference at location [0] with -1.3210333585739136 vs -1.299375057220...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-True-RMSNorm-relu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=50. Maximum difference at location [100] with 0.18327626585960388 vs 0.38112670183...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-True-RMSNorm-relu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=121. Maximum difference at location [1, 15] with -0.0071725500747561455 vs 0.01547...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-True-RMSNorm-reglu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=351. Maximum difference at location [77] with -0.04859239235520363 vs -0.076162673...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-True-RMSNorm-reglu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=263. Maximum difference at location [56] with 0.03622261434793472 vs 0.12370993196...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-False-LayerNorm-relu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=14. Maximum difference at location [0, 99] with -0.07819076627492905 vs -0.1000889...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-False-LayerNorm-relu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=7. Maximum difference at location [0, 99] with -0.07819075882434845 vs -0.10008895...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-False-LayerNorm-reglu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=105. Maximum difference at location [75] with -0.04723335802555084 vs -0.014243453...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-False-LayerNorm-reglu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=22. Maximum difference at location [0] with -1.3210333585739136 vs -1.299375057220...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-False-RMSNorm-relu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=50. Maximum difference at location [100] with 0.18327626585960388 vs 0.38112670183...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-False-RMSNorm-relu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=121. Maximum difference at location [1, 15] with -0.0071725500747561455 vs 0.01547...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-False-RMSNorm-reglu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=351. Maximum difference at location [77] with -0.04859239235520363 vs -0.076162673...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-False-RMSNorm-reglu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=263. Maximum difference at location [56] with 0.03622261434793472 vs 0.12370993196...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-True-LayerNorm-relu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=14. Maximum difference at location [0, 99] with -0.07819076627492905 vs -0.1000889...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-True-LayerNorm-relu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=7. Maximum difference at location [0, 99] with -0.07819075882434845 vs -0.10008895...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-True-LayerNorm-reglu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=105. Maximum difference at location [75] with -0.04723335802555084 vs -0.014243453...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-True-LayerNorm-reglu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=22. Maximum difference at location [0] with -1.3210333585739136 vs -1.299375057220...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-True-RMSNorm-relu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=50. Maximum difference at location [100] with 0.18327626585960388 vs 0.38112670183...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-True-RMSNorm-relu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=121. Maximum difference at location [1, 15] with -0.0071725500747561455 vs 0.01547...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-True-RMSNorm-reglu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=351. Maximum difference at location [77] with -0.04859239235520363 vs -0.076162673...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-True-RMSNorm-reglu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=263. Maximum difference at location [56] with 0.03622261434793472 vs 0.12370993196...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-False-LayerNorm-relu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=14. Maximum difference at location [0, 99] with -0.07819076627492905 vs -0.1000889...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-False-LayerNorm-relu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=7. Maximum difference at location [0, 99] with -0.07819075882434845 vs -0.10008895...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-False-LayerNorm-reglu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=105. Maximum difference at location [75] with -0.04723335802555084 vs -0.014243453...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-False-LayerNorm-reglu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=22. Maximum difference at location [0] with -1.3210333585739136 vs -1.299375057220...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-False-RMSNorm-relu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=50. Maximum difference at location [100] with 0.18327626585960388 vs 0.38112670183...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-False-RMSNorm-relu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=121. Maximum difference at location [1, 15] with -0.0071725500747561455 vs 0.01547...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-False-RMSNorm-reglu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=351. Maximum difference at location [77] with -0.04859239235520363 vs -0.076162673...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-False-RMSNorm-reglu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=263. Maximum difference at location [56] with 0.03622261434793472 vs 0.12370993196...

Copy link
Copy Markdown
Member

@timmoon10 timmoon10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suspicious of this change:

  • Our nightly tests pass without this change.
  • PyTorch enables TF32 by default, so disabling moves us away from typical workloads.
  • This flag won't affect TE since we manually enable TF32 in our high-precision GEMMs.

I think we need to dig more into why you are seeing failures. Also, I think there are better approaches from a test design perspective. If we treat the FP32 implementation as a mathematical ground-truth, then we should consider changing to FP64 CPU compute so we get a better approximation and disentangle from GPU bugs. If we treat the FP32 implementation as an implementation ground-truth, then TF32 is loadbearing and we shouldn't disable it.

@francesco-bertolotti
Copy link
Copy Markdown
Contributor Author

No problem, I will look into it more closely. Thank you for letting me know!

​I ran into a few issues while building and running TE on our setup, so it's entirely possible I might have messed something up on my end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants