[model] support paddle_ocr v1.6 by Jintao-Huang · Pull Request #9464 · modelscope/ms-swift

Jintao-Huang · 2026-06-01T08:46:13Z

No description provided.

gemini-code-assist

Code Review

This pull request unifies the PaddleOCR-VL models (v1.0, v1.5, and v1.6) under a single model type paddleocr_vl and model architecture paddleocr_vl, removing the redundant paddle_ocr registration. Feedback points out that since these models are now unified, the registered model architecture keys for paddleocr_vl must be updated to support both v1.0 and v1.5/1.6 architectures, which use different paths for the language model, aligner, and vision tower.

Jintao-Huang · 2026-06-01T10:32:36Z

/gemini review

gemini-code-assist

Code Review

This pull request updates the integration of PaddleOCR-VL models, renaming the model type and architecture from paddle_ocr_1_5 to paddleocr_vl, adding support for PaddleOCR-VL-1.6, and introducing version-specific template configurations and transformer version requirements. It also refactors ModelLoader to support a configurable default_trust_remote_code attribute. However, two critical issues were identified: the template argument was incorrectly passed to ModelGroup instead of ModelMeta in swift/model/models/baidu.py, and use_model = True was accidentally removed from PaddleOCR1_5Template in swift/template/templates/baidu.py, which is required for its _post_encode method.

Jintao-Huang · 2026-06-01T12:55:32Z

/gemini review

gemini-code-assist

Code Review

This pull request updates the integration of PaddleOCR-VL models. Specifically, it adds support for PaddleOCR-VL-1.6, updates model types and architectures (renaming paddle_ocr_1_5 to paddleocr_vl), and refactors the model loading and template processing logic. Key changes include adding version-specific transformer requirements, dynamically managing trust_remote_code settings in ModelLoader, and updating PaddleOCRTemplate to support custom image processor arguments and padding-free training. Feedback suggests dynamically retrieving the image_token_id in PaddleOCRTemplate rather than relying on a hardcoded value to prevent potential issues with customized tokenizers.

gemini-code-assist · 2026-06-01T12:57:17Z

        labels = encoded['labels']
        loss_scale = encoded.get('loss_scale', None)
-        idx_list = findall(input_ids, -100)
+        idx_list = findall(input_ids, self.image_token_id)


Hardcoding the image_token_id to 100295 can be fragile if the tokenizer vocabulary changes or if a customized tokenizer is used. It is safer to dynamically retrieve the token ID from the processor or tokenizer, and only fall back to the hardcoded value if it cannot be found.

image_token_id = getattr(self.processor, 'image_token_id', None) or self.tokenizer.convert_tokens_to_ids(self.image_token) if image_token_id is None or image_token_id == getattr(self.tokenizer, 'unk_token_id', None): image_token_id = self.image_token_id idx_list = findall(input_ids, image_token_id)

support paddle_ocr 1.6

c4431b7

gemini-code-assist Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread swift/model/model_arch.py

Jintao-Huang added 4 commits June 1, 2026 17:00

fix

e7f2cbc

fix

3b43a75

fix

19f7b22

update

04687a4

Jintao-Huang mentioned this pull request Jun 1, 2026

目前支持PaddleOCR-VL-1.6 吗？ #9462

Open

1 task

gemini-code-assist Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread swift/model/models/baidu.py

Comment thread swift/template/templates/baidu.py

tastelikefeet approved these changes Jun 1, 2026

View reviewed changes

hjh0119 approved these changes Jun 1, 2026

View reviewed changes

Jintao-Huang added 3 commits June 1, 2026 20:43

fix

fdac025

fix

75b53bd

fix

19bb6c3

gemini-code-assist Bot reviewed Jun 1, 2026

View reviewed changes

Jintao-Huang merged commit 9cff47a into modelscope:main Jun 1, 2026
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[model] support paddle_ocr v1.6#9464

[model] support paddle_ocr v1.6#9464
Jintao-Huang merged 8 commits into
modelscope:mainfrom
Jintao-Huang:support_padding_ocr_v1_6

Jintao-Huang commented Jun 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Jintao-Huang commented Jun 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Jintao-Huang commented Jun 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Jintao-Huang commented Jun 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Jintao-Huang commented Jun 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Jintao-Huang commented Jun 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants