Skip to content

[model] support paddle_ocr v1.6#9464

Merged
Jintao-Huang merged 8 commits into
modelscope:mainfrom
Jintao-Huang:support_padding_ocr_v1_6
Jun 1, 2026
Merged

[model] support paddle_ocr v1.6#9464
Jintao-Huang merged 8 commits into
modelscope:mainfrom
Jintao-Huang:support_padding_ocr_v1_6

Conversation

@Jintao-Huang
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request unifies the PaddleOCR-VL models (v1.0, v1.5, and v1.6) under a single model type paddleocr_vl and model architecture paddleocr_vl, removing the redundant paddle_ocr registration. Feedback points out that since these models are now unified, the registered model architecture keys for paddleocr_vl must be updated to support both v1.0 and v1.5/1.6 architectures, which use different paths for the language model, aligner, and vision tower.

Comment thread swift/model/model_arch.py
@Jintao-Huang
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the integration of PaddleOCR-VL models, renaming the model type and architecture from paddle_ocr_1_5 to paddleocr_vl, adding support for PaddleOCR-VL-1.6, and introducing version-specific template configurations and transformer version requirements. It also refactors ModelLoader to support a configurable default_trust_remote_code attribute. However, two critical issues were identified: the template argument was incorrectly passed to ModelGroup instead of ModelMeta in swift/model/models/baidu.py, and use_model = True was accidentally removed from PaddleOCR1_5Template in swift/template/templates/baidu.py, which is required for its _post_encode method.

Comment thread swift/model/models/baidu.py
Comment thread swift/template/templates/baidu.py
@Jintao-Huang
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the integration of PaddleOCR-VL models. Specifically, it adds support for PaddleOCR-VL-1.6, updates model types and architectures (renaming paddle_ocr_1_5 to paddleocr_vl), and refactors the model loading and template processing logic. Key changes include adding version-specific transformer requirements, dynamically managing trust_remote_code settings in ModelLoader, and updating PaddleOCRTemplate to support custom image processor arguments and padding-free training. Feedback suggests dynamically retrieving the image_token_id in PaddleOCRTemplate rather than relying on a hardcoded value to prevent potential issues with customized tokenizers.

labels = encoded['labels']
loss_scale = encoded.get('loss_scale', None)
idx_list = findall(input_ids, -100)
idx_list = findall(input_ids, self.image_token_id)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hardcoding the image_token_id to 100295 can be fragile if the tokenizer vocabulary changes or if a customized tokenizer is used. It is safer to dynamically retrieve the token ID from the processor or tokenizer, and only fall back to the hardcoded value if it cannot be found.

        image_token_id = getattr(self.processor, 'image_token_id', None) or self.tokenizer.convert_tokens_to_ids(self.image_token)
        if image_token_id is None or image_token_id == getattr(self.tokenizer, 'unk_token_id', None):
            image_token_id = self.image_token_id
        idx_list = findall(input_ids, image_token_id)

@Jintao-Huang Jintao-Huang merged commit 9cff47a into modelscope:main Jun 1, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants