Add MGP-STR (alibaba-damo/mgp-str-base) image-to-text task support by ssss141414 · Pull Request #952 · microsoft/winml-cli

ssss141414 · 2026-06-24T04:14:31Z

Summary

Adds Effort-L1-light registration so MGP-STR scene-text-recognition models resolve under the user-facing image-to-text task label. The vendor MgpstrOnnxConfig (Optimum) already exposes the 3-head outputs (char_logits, bpe_logits, wp_logits) correctly, but is registered ONLY under feature-extraction. This PR adds a task-label alias + MODEL_CLASS_MAPPING binding to MgpstrForSceneTextRecognition (the head-bearing class — MGP-STR is NOT a generic Vision2Seq).

Files changed (5)

src/winml/modelkit/models/hf/mgp_str.py (NEW, 58 lines) — MgpstrImage2TextOnnxConfig(MgpstrOnnxConfig) subclass
src/winml/modelkit/models/hf/__init__.py — 3-line wiring
examples/recipes/alibaba-damo_mgp-str-base/image-to-text_config.json (NEW, 49 lines) — recipe
examples/recipes/README.md — catalog row
research/adding-model-support/model_knowledge/mgp_str.json — mgp_str-004 post-mortem finding

Goal-ladder verdict

alibaba-damo/mgp-str-base @ image-to-text @ fp32 @ cpu

Tier	Verdict	Evidence
L0 build	PASS	83.7s, 374 nodes, 564.5 MB optimized; autoconf converged in 2 iters
L1 perf	PASS	avg=100.76ms, P90=123.26ms, 9.92 samples/sec (20 iters CPU)
L2 numerical	PASS	cosine vs PT: char=0.99999999999992, bpe=0.99999999999974, wp=0.99999999999860; max-abs 5.7e-05 / 2.4e-04 / 2.1e-04
L3 eval	CLI-BLOCKED	`image-to-text` task has no default dataset (same as vit-gpt2)

Step 1b verification — real engineering vs catalog-only

Gate 1 (auto-config-diff): identical to winml config --task image-to-text (recipe is autoconf-faithful)
Gate 2 (baseline build on main): FAILS with mgp-str doesn't support task image-to-text for the onnx backend. → real engineering delta, NOT catalog-only.

Known gotchas

HF model card declares legacy architectures: ['MGPSTRModel'] but current transformers exports MgpstrModel (CamelCase rename). Without --task image-to-text explicit, winml inspect/config/build fail with Cannot import MGPSTRModel from transformers. CLI robustness gap separate from this PR.
3 Einsum ops in a3_module heads are non-fatal on CPU.

Verification

uv run winml build -c examples/recipes/alibaba-damo_mgp-str-base/image-to-text_config.json -m alibaba-damo/mgp-str-base -o temp/mgp_build --ep cpu --device cpu --rebuild
uv run winml perf -m temp/mgp_build/model.onnx --ep cpu --device cpu --iterations 20

Adds Effort-L1-light registration so MGP-STR scene-text-recognition models resolve under the user-facing 'image-to-text' task label. The vendor MgpstrOnnxConfig (Optimum) already exposes the 3-head outputs (char_logits, bpe_logits, wp_logits) correctly but is registered only under feature-extraction. This PR adds a task-label alias plus MODEL_CLASS_MAPPING binding to MgpstrForSceneTextRecognition. Files: - src/winml/modelkit/models/hf/mgp_str.py: MgpstrImage2TextOnnxConfig subclass (58 lines) - src/winml/modelkit/models/hf/__init__.py: 3-line wiring - examples/recipes/alibaba-damo_mgp-str-base/image-to-text_config.json: recipe (49 lines) - examples/recipes/README.md: catalog row - research/adding-model-support/model_knowledge/mgp_str.json: mgp_str-004 finding Goal-ladder (alibaba-damo/mgp-str-base @ image-to-text @ fp32 @ cpu): - L0 PASS: build 83.7s, 374 nodes, 564.5 MB optimized - L1 PASS: avg=100.76ms, P90=123.26ms, 9.92 samples/sec (20 iters) - L2 PASS: cosine vs PyTorch reference all 3 heads >=0.999999 (max-abs <3e-4) - L3 CLI-BLOCKED: image-to-text task has no default dataset (same as nlpconnect/vit-gpt2-image-captioning per known limitation) Step 1b verification: baseline 'winml build' on main fails with 'mgp-str doesn't support task image-to-text' (real engineering delta, not catalog-only).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MGP-STR (alibaba-damo/mgp-str-base) image-to-text task support#952

Add MGP-STR (alibaba-damo/mgp-str-base) image-to-text task support#952
ssss141414 wants to merge 1 commit into
mainfrom
shzhen/add-mgp-str-base

ssss141414 commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ssss141414 commented Jun 24, 2026

Summary

Files changed (5)

Goal-ladder verdict

Step 1b verification — real engineering vs catalog-only

Known gotchas

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant