Compiling a bfloat16 model triggers float32 precision PyTorch warning

### System Info

5.0.0rc1, 2.9.1

### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

```
/.../.venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py:312: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
  warnings.warn(
```

```python
model = transformers.AutoModelForCausalLM.from_pretrained("PleIAs/Monad", attn_implementation = 'sdpa', dtype = torch.bfloat16)
model = torch.compile(model, mode = 'reduce_overhead')
model(...)
```

Is this warning because the model still does some fp32 matmul?

### Expected behavior

no warning when bfloat16 model is used

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compiling a bfloat16 model triggers float32 precision PyTorch warning #43012

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Compiling a bfloat16 model triggers float32 precision PyTorch warning #43012

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions