fix(quantize): suppress ORT pre-processing warnings in quantize flow#956
Open
xieofxie wants to merge 3 commits into
Open
fix(quantize): suppress ORT pre-processing warnings in quantize flow#956xieofxie wants to merge 3 commits into
xieofxie wants to merge 3 commits into
Conversation
ORT's quantize_static emits two near-duplicate "consider pre-processing before quantization" warnings, gated on model_has_pre_process_metadata() of the model it reloads from the input path. Move the pre-process metadata tagging out of optimize_onnx (where it only helped models that went through optimize) and into the quantizer: load the input model, tag it via ORT's own add_pre_process_metadata(), and hand the in-memory ModelProto to quantize() so both warnings are suppressed without mutating the user's input file on disk. Fixes #557
- max_optim_iterations_option: use '. ' separator before optional_message to match the optimize/analyze sibling helpers (drop the redundant trailing period from the base help so the joined output stays clean). - eval: warn when build-pipeline flags (--no-quant/--no-optimize/--no-analyze/ --max-optim-iterations) are passed with a pre-built ONNX path and skip_build, where they are silently forwarded as no-ops — mirrors the existing --precision-ignored warning.
The same no-op situation exists in perf: --no-quant/--no-optimize/ --no-analyze/--max-optim-iterations are forwarded to from_onnx but never take effect when a pre-built ONNX is run with skip_build (the default). Extract the detection into cli_utils.ignored_build_flags_warning(), which returns the warning message (or None). eval emits it via logger.warning; perf via console.print, matching each command's existing warning style. Adds unit tests for the helper plus eval/perf CLI coverage.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related cleanups in the quantize/eval CLI surface.
1. Suppress duplicate ORT pre-processing warnings (fixes #557)
winml quantizeemitted two near-duplicate ORT warnings per run:Both originate in ORT's
quantize_static, gated onmodel_has_pre_process_metadata()of the model it reloads from the input path. Previously the pre-process metadata was injected inoptimize_onnx, which only suppressed the warnings for models that went through optimize first — a standalonewinml quantizeon a raw model still warned twice.Moved the tagging into the quantizer: load the input model, tag it via ORT's own
add_pre_process_metadata(), and hand the in-memoryModelPrototoquantize(). Passing the in-memory model (rather than the path) is required because ORT checks the flag on the copy it reloads frommodel_input, and it avoids mutating the user's pristine input file. ORT externalizes weights into its own temp dir, so external-data handling is unaffected.2. Address deferred review comments from #923
max_optim_iterations_option: use.beforeoptional_messageto match theoptimize/analyzesibling helpers (drop the redundant trailing period from the base help so the joined output stays clean).eval: warn when build-pipeline flags (--no-quant/--no-optimize/--no-analyze/--max-optim-iterations) are passed with a pre-built ONNX path andskip_build, where they're silently forwarded tofrom_onnxas no-ops — mirrors the existing--precision-ignored warning.Verification
tests/unit/test_quantizer.py— passestests/integration/test_quantization.py(3 tests, real ORT) — passtests/unit/commands/test_eval.py+tests/unit/utils/test_cli.py(79 tests) — passFixes #557