Switch pytorch_inf from float32 to auto to fix OOM on 16GB machines by philnach · Pull Request #10 · microsoft/HOBL

philnach · 2026-04-02T01:52:01Z

Problem

The pytorch_inf inference.py script was failing with out-of-memory errors on 16GB devices. Investigation revealed two bugs in how the model was being loaded:

Bug 1: Wrong parameter name (`dtype` vs `torch_dtype`)

The script used dtype= when calling AutoModelForCausalLM.from_pretrained():

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    dtype=torch.float16 if device == 'cuda' else torch.float32,  # wrong parameter name
    ...
)

The correct parameter is torch_dtype. The dtype kwarg was likely silently ignored via **kwargs, causing the model to load in its framework default precision (float32).

Bug 2: Hardcoded dtype instead of using model's native precision

Even if the parameter name were correct, the logic hardcoded float16 for CUDA and float32 for CPU. Phi-4-mini-instruct's weights are natively stored as bfloat16 (per its config.json: "torch_dtype": "bfloat16").

float32 on CPU: Upcasts every parameter, doubling memory from ~7.6GB to ~15.2GB — leaving virtually nothing for OS + inference on a 16GB machine
float16 on CUDA: Narrower exponent range than the native bfloat16, risking overflow/underflow

Fix

Replaced with torch_dtype="auto", which reads the dtype directly from the model's config.json. This matches the official HuggingFace example:

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",   # resolves to bfloat16 from model config
    ...
)

Why `torch_dtype="auto"` is the right approach

Approach	CPU Memory	CUDA Precision	Future-proof
`dtype=torch.float32` (old, broken)	~15.2 GB	N/A (wrong param name)	No
`torch_dtype=torch.bfloat16` (hardcoded)	~7.6 GB	Correct	No — breaks if model changes
`torch_dtype="auto"`	~7.6 GB	Correct	Yes — reads from config.json

Changes

Fixed parameter name from dtype to torch_dtype in both setup_model() and main() load paths
Changed value to "auto" (resolves to bfloat16 from Phi-4-mini config.json)
Applied to both Windows and macOS inference.py
Bumped prep_version: Windows 9→10, macOS 5→6

Files Changed

scenarios/windows/pytorch_inf/pytorch_inf_resources/inference.py
scenarios/macos/mac_pytorch_inf/mac_pytorch_inf_resources/inference.py
scenarios/windows/pytorch_inf/pytorch_inf.py
scenarios/macos/mac_pytorch_inf/mac_pytorch_inf.py

… example and fix OOM on 16GB machines Two bugs fixed: 1. Wrong parameter name: used 'dtype=' instead of 'torch_dtype=' for from_pretrained(), which was likely silently ignored, causing the model to load in default (float32) precision. 2. Hardcoded dtype logic: float16 for CUDA / float32 for CPU instead of using the model's native bfloat16. Phi-4-mini weights are stored as bfloat16 per config.json. Fix: Use torch_dtype='auto' which reads the dtype from the model's config.json (bfloat16), matching the official HuggingFace example. This halves CPU memory from ~15.2GB to ~7.6GB and uses the correct precision on all devices. Changes: - Fixed parameter name from 'dtype' to 'torch_dtype' in both setup_model() and main() - Changed value to 'auto' (resolves to bfloat16 from model config) - Applied to both Windows and macOS inference.py - Bumped prep_version: Windows 9->10, macOS 5->6

jewilder approved these changes Apr 2, 2026

View reviewed changes

jewilder merged commit 3315415 into microsoft:main Apr 2, 2026
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch pytorch_inf from float32 to auto to fix OOM on 16GB machines#10

Switch pytorch_inf from float32 to auto to fix OOM on 16GB machines#10
jewilder merged 1 commit intomicrosoft:mainfrom
philnach:fix/pytorch-inf-bfloat16-memory

philnach commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

philnach commented Apr 2, 2026

Problem

Bug 1: Wrong parameter name (dtype vs torch_dtype)

Bug 2: Hardcoded dtype instead of using model's native precision

Fix

Why torch_dtype="auto" is the right approach

Changes

Files Changed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bug 1: Wrong parameter name (`dtype` vs `torch_dtype`)

Why `torch_dtype="auto"` is the right approach