fix(quantize): suppress ORT pre-processing warnings in quantize flow by xieofxie · Pull Request #956 · microsoft/winml-cli

xieofxie · 2026-06-24T07:54:20Z

Summary

Two related cleanups in the quantize/eval CLI surface.

1. Suppress duplicate ORT pre-processing warnings (fixes #557)

winml quantize emitted two near-duplicate ORT warnings per run:

WARNING: Please consider to run pre-processing before quantization. Refer to example: …
WARNING: Please consider pre-processing before quantization. See …

Both originate in ORT's quantize_static, gated on model_has_pre_process_metadata() of the model it reloads from the input path. Previously the pre-process metadata was injected in optimize_onnx, which only suppressed the warnings for models that went through optimize first — a standalone winml quantize on a raw model still warned twice.

Moved the tagging into the quantizer: load the input model, tag it via ORT's own add_pre_process_metadata(), and hand the in-memory ModelProto to quantize(). Passing the in-memory model (rather than the path) is required because ORT checks the flag on the copy it reloads from model_input, and it avoids mutating the user's pristine input file. ORT externalizes weights into its own temp dir, so external-data handling is unaffected.

2. Address deferred review comments from #923

max_optim_iterations_option: use . before optional_message to match the optimize/analyze sibling helpers (drop the redundant trailing period from the base help so the joined output stays clean).
eval: warn when build-pipeline flags (--no-quant/--no-optimize/--no-analyze/--max-optim-iterations) are passed with a pre-built ONNX path and skip_build, where they're silently forwarded to from_onnx as no-ops — mirrors the existing --precision-ignored warning.

Verification

tests/unit/test_quantizer.py — passes
tests/integration/test_quantization.py (3 tests, real ORT) — pass
tests/unit/commands/test_eval.py + tests/unit/utils/test_cli.py (79 tests) — pass

Fixes #557

ORT's quantize_static emits two near-duplicate "consider pre-processing before quantization" warnings, gated on model_has_pre_process_metadata() of the model it reloads from the input path. Move the pre-process metadata tagging out of optimize_onnx (where it only helped models that went through optimize) and into the quantizer: load the input model, tag it via ORT's own add_pre_process_metadata(), and hand the in-memory ModelProto to quantize() so both warnings are suppressed without mutating the user's input file on disk. Fixes #557

- max_optim_iterations_option: use '. ' separator before optional_message to match the optimize/analyze sibling helpers (drop the redundant trailing period from the base help so the joined output stays clean). - eval: warn when build-pipeline flags (--no-quant/--no-optimize/--no-analyze/ --max-optim-iterations) are passed with a pre-built ONNX path and skip_build, where they are silently forwarded as no-ops — mirrors the existing --precision-ignored warning.

The same no-op situation exists in perf: --no-quant/--no-optimize/ --no-analyze/--max-optim-iterations are forwarded to from_onnx but never take effect when a pre-built ONNX is run with skip_build (the default). Extract the detection into cli_utils.ignored_build_flags_warning(), which returns the warning message (or None). eval emits it via logger.warning; perf via console.print, matching each command's existing warning style. Adds unit tests for the helper plus eval/perf CLI coverage.

xieofxie requested a review from a team as a code owner June 24, 2026 07:54

xieofxie added 2 commits June 24, 2026 15:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(quantize): suppress ORT pre-processing warnings in quantize flow#956

fix(quantize): suppress ORT pre-processing warnings in quantize flow#956
xieofxie wants to merge 3 commits into
mainfrom
hualxie/hack_preprocess

xieofxie commented Jun 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xieofxie commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Suppress duplicate ORT pre-processing warnings (fixes #557)

2. Address deferred review comments from #923

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xieofxie commented Jun 24, 2026 •

edited

Loading