Skip to content

fix(quantize): suppress ORT pre-processing warnings in quantize flow#956

Open
xieofxie wants to merge 3 commits into
mainfrom
hualxie/hack_preprocess
Open

fix(quantize): suppress ORT pre-processing warnings in quantize flow#956
xieofxie wants to merge 3 commits into
mainfrom
hualxie/hack_preprocess

Conversation

@xieofxie

@xieofxie xieofxie commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

Two related cleanups in the quantize/eval CLI surface.

1. Suppress duplicate ORT pre-processing warnings (fixes #557)

winml quantize emitted two near-duplicate ORT warnings per run:

WARNING: Please consider to run pre-processing before quantization. Refer to example: …
WARNING: Please consider pre-processing before quantization. See …

Both originate in ORT's quantize_static, gated on model_has_pre_process_metadata() of the model it reloads from the input path. Previously the pre-process metadata was injected in optimize_onnx, which only suppressed the warnings for models that went through optimize first — a standalone winml quantize on a raw model still warned twice.

Moved the tagging into the quantizer: load the input model, tag it via ORT's own add_pre_process_metadata(), and hand the in-memory ModelProto to quantize(). Passing the in-memory model (rather than the path) is required because ORT checks the flag on the copy it reloads from model_input, and it avoids mutating the user's pristine input file. ORT externalizes weights into its own temp dir, so external-data handling is unaffected.

2. Address deferred review comments from #923

  • max_optim_iterations_option: use . before optional_message to match the optimize/analyze sibling helpers (drop the redundant trailing period from the base help so the joined output stays clean).
  • eval: warn when build-pipeline flags (--no-quant/--no-optimize/--no-analyze/--max-optim-iterations) are passed with a pre-built ONNX path and skip_build, where they're silently forwarded to from_onnx as no-ops — mirrors the existing --precision-ignored warning.

Verification

  • tests/unit/test_quantizer.py — passes
  • tests/integration/test_quantization.py (3 tests, real ORT) — pass
  • tests/unit/commands/test_eval.py + tests/unit/utils/test_cli.py (79 tests) — pass

Fixes #557

ORT's quantize_static emits two near-duplicate "consider pre-processing
before quantization" warnings, gated on model_has_pre_process_metadata()
of the model it reloads from the input path.

Move the pre-process metadata tagging out of optimize_onnx (where it only
helped models that went through optimize) and into the quantizer: load the
input model, tag it via ORT's own add_pre_process_metadata(), and hand the
in-memory ModelProto to quantize() so both warnings are suppressed without
mutating the user's input file on disk.

Fixes #557
@xieofxie xieofxie requested a review from a team as a code owner June 24, 2026 07:54
xieofxie added 2 commits June 24, 2026 15:58
- max_optim_iterations_option: use '. ' separator before optional_message
  to match the optimize/analyze sibling helpers (drop the redundant trailing
  period from the base help so the joined output stays clean).
- eval: warn when build-pipeline flags (--no-quant/--no-optimize/--no-analyze/
  --max-optim-iterations) are passed with a pre-built ONNX path and skip_build,
  where they are silently forwarded as no-ops — mirrors the existing
  --precision-ignored warning.
The same no-op situation exists in perf: --no-quant/--no-optimize/
--no-analyze/--max-optim-iterations are forwarded to from_onnx but never
take effect when a pre-built ONNX is run with skip_build (the default).

Extract the detection into cli_utils.ignored_build_flags_warning(), which
returns the warning message (or None). eval emits it via logger.warning;
perf via console.print, matching each command's existing warning style.

Adds unit tests for the helper plus eval/perf CLI coverage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[winml quantize] [P2] Two near-duplicate ORT WARNING lines per run

1 participant