Add MGP-STR (alibaba-damo/mgp-str-base) image-to-text task support#952
Draft
ssss141414 wants to merge 1 commit into
Draft
Add MGP-STR (alibaba-damo/mgp-str-base) image-to-text task support#952ssss141414 wants to merge 1 commit into
ssss141414 wants to merge 1 commit into
Conversation
Adds Effort-L1-light registration so MGP-STR scene-text-recognition models resolve under the user-facing 'image-to-text' task label. The vendor MgpstrOnnxConfig (Optimum) already exposes the 3-head outputs (char_logits, bpe_logits, wp_logits) correctly but is registered only under feature-extraction. This PR adds a task-label alias plus MODEL_CLASS_MAPPING binding to MgpstrForSceneTextRecognition. Files: - src/winml/modelkit/models/hf/mgp_str.py: MgpstrImage2TextOnnxConfig subclass (58 lines) - src/winml/modelkit/models/hf/__init__.py: 3-line wiring - examples/recipes/alibaba-damo_mgp-str-base/image-to-text_config.json: recipe (49 lines) - examples/recipes/README.md: catalog row - research/adding-model-support/model_knowledge/mgp_str.json: mgp_str-004 finding Goal-ladder (alibaba-damo/mgp-str-base @ image-to-text @ fp32 @ cpu): - L0 PASS: build 83.7s, 374 nodes, 564.5 MB optimized - L1 PASS: avg=100.76ms, P90=123.26ms, 9.92 samples/sec (20 iters) - L2 PASS: cosine vs PyTorch reference all 3 heads >=0.999999 (max-abs <3e-4) - L3 CLI-BLOCKED: image-to-text task has no default dataset (same as nlpconnect/vit-gpt2-image-captioning per known limitation) Step 1b verification: baseline 'winml build' on main fails with 'mgp-str doesn't support task image-to-text' (real engineering delta, not catalog-only).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds Effort-L1-light registration so MGP-STR scene-text-recognition models resolve under the user-facing
image-to-texttask label. The vendorMgpstrOnnxConfig(Optimum) already exposes the 3-head outputs (char_logits,bpe_logits,wp_logits) correctly, but is registered ONLY underfeature-extraction. This PR adds a task-label alias +MODEL_CLASS_MAPPINGbinding toMgpstrForSceneTextRecognition(the head-bearing class — MGP-STR is NOT a generic Vision2Seq).Files changed (5)
src/winml/modelkit/models/hf/mgp_str.py(NEW, 58 lines) —MgpstrImage2TextOnnxConfig(MgpstrOnnxConfig)subclasssrc/winml/modelkit/models/hf/__init__.py— 3-line wiringexamples/recipes/alibaba-damo_mgp-str-base/image-to-text_config.json(NEW, 49 lines) — recipeexamples/recipes/README.md— catalog rowresearch/adding-model-support/model_knowledge/mgp_str.json—mgp_str-004post-mortem findingGoal-ladder verdict
alibaba-damo/mgp-str-base @ image-to-text @ fp32 @ cpuimage-to-texttask has no default dataset (same as vit-gpt2)Step 1b verification — real engineering vs catalog-only
winml config --task image-to-text(recipe is autoconf-faithful)mgp-str doesn't support task image-to-text for the onnx backend.→ real engineering delta, NOT catalog-only.Known gotchas
architectures: ['MGPSTRModel']but currenttransformersexportsMgpstrModel(CamelCase rename). Without--task image-to-textexplicit,winml inspect/config/buildfail withCannot import MGPSTRModel from transformers. CLI robustness gap separate from this PR.a3_moduleheads are non-fatal on CPU.Verification