fix(tf): dispatch --init-model to checkpoint pre-inspection by wanghan-iapcm · Pull Request #5718 · deepmodeling/deepmd-kit

wanghan-iapcm · 2026-07-03T06:24:07Z

Problem

Fixes #5679. RunOptions records dp train --init-model as init_mode == "init_from_model" (deepmd/tf/train/run_options.py), but DPTrainer.build() dispatched the init step on the literal "init_model". The two strings never matched, so _init_from_ckpt(...) was skipped for --init-model. That pre-inspection is the step that imports the source checkpoint's meta graph and, when the checkpoint is a compressed_model, sets self.ckpt_meta before the graph is built (the graph build later consumes ckpt_meta). With the mismatch, a compressed-checkpoint --init-model run builds the graph without its checkpoint metadata.

The bug was masked for the common case because uncompressed --init-model still works: the variables are restored later in _init_session, which uses the correct "init_from_model" literal and does not need ckpt_meta. So only compressed-checkpoint initialization was actually exposed, and no test exercised it.

Fix

Correct the dispatch literal to "init_from_model". To make the dispatch unit-testable — the reason the mismatch went uncaught is that it lived inline in the heavyweight build() — the four-way init dispatch is extracted into a small _init_from_run_opt() helper. The restart, init-from-frozen-model, and finetune branches already used the correct literals and are unchanged in behavior.

Test

Adds source/tests/tf/test_trainer_init_mode.py, which drives _init_from_run_opt on a stub trainer with the three concrete initializers mocked and asserts each init_mode routes correctly. On the old literal, init_from_model routes to nothing (the test fails); with the fix it reaches _init_from_ckpt(init_model). The test also covers restart, init_from_frz_model, finetune, and the scratch no-op. This dispatch previously had no coverage.

Summary by CodeRabbit

Bug Fixes
- Improved training startup so model initialization now follows the selected setup mode more consistently, including restore, restart, fine-tuning, and frozen-model workflows.
- If no initialization mode is selected, training now proceeds without extra initialization steps.
Tests
- Added regression coverage for all supported initialization paths to help prevent future startup regressions.

RunOptions records `dp train --init-model` as init_mode == "init_from_model", but DPTrainer.build() dispatched on the literal "init_model", so the branch never matched and _init_from_ckpt was skipped for --init-model. That pre-inspection is what imports the source checkpoint's meta graph and sets self.ckpt_meta when the checkpoint is a compressed_model, before the graph is built with ckpt_meta. With the mismatch, compressed-checkpoint --init-model builds the graph without its checkpoint metadata. Uncompressed --init-model still worked because variables are restored later in _init_session (which uses the correct "init_from_model" literal) and needs no ckpt_meta, which masked the bug. Fix the literal to "init_from_model". The 4-way init dispatch is extracted from the heavyweight build() into a small _init_from_run_opt() helper so it can be unit-tested in isolation; this is why the mismatch went uncaught. Adds a regression test that drives the dispatch with a stub trainer and mocked initializers: it fails on the old literal (init_from_model routes nowhere) and passes with the fix, and also covers restart, init_from_frz_model, finetune, and scratch. Fix deepmodeling#5679

coderabbitai · 2026-07-03T06:26:44Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d5e94e3f-4fdf-4477-86ff-c5cd321dfe42

📥 Commits

Reviewing files that changed from the base of the PR and between dd38b35 and bb30614.

📒 Files selected for processing (2)

deepmd/tf/train/trainer.py
source/tests/tf/test_trainer_init_mode.py

📝 Walkthrough

Walkthrough

The inline run_opt.init_mode dispatch logic in DPTrainer.build() was extracted into a new _init_from_run_opt() method that routes to existing initializer methods based on init mode, with no fallback for unrecognized modes. A new test module validates the dispatch routing for all supported init modes.

Changes

Trainer init-mode refactor

Layer / File(s)	Summary
Extract init-mode dispatch into helper method `deepmd/tf/train/trainer.py`	`build()` now calls a new `_init_from_run_opt()` method that routes `init_from_frz_model`, `init_from_model`, `restart`, and `finetune` modes to their respective initializers, silently skipping unrecognized modes.
Regression tests for dispatch routing `source/tests/tf/test_trainer_init_mode.py`	New test module bypasses `DPTrainer` construction, patches initializer methods, and asserts correct routing per init mode, including a no-op check for `init_from_scratch`.

Estimated code review effort: 2 (Simple) | ~10 minutes

Possibly related issues

[Code scan] Inspect TensorFlow --init-model checkpoints before graph build #5679: Reports the same --init-model/restart pre-build inspection logic that this PR refactors into _init_from_run_opt() with matching tests.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main fix: routing --init-model to checkpoint pre-inspection in TensorFlow training.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

+"""
+
+import types
+import unittest


codecov · 2026-07-03T07:29:11Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.14%. Comparing base (dd38b35) to head (bb30614).

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5718      +/-   ##
==========================================
- Coverage   81.26%   81.14%   -0.13%     
==========================================
  Files         988      988              
  Lines      110877   110877              
  Branches     4234     4234              
==========================================
- Hits        90103    89966     -137     
- Misses      19249    19383     +134     
- Partials     1525     1528       +3

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

njzjz-bot

Reviewed the RunOptions/DPTrainer dispatch path and the new regression test. The fix changes the pre-build checkpoint inspection dispatch to the RunOptions literal init_from_model, which matches the documented bug in #5679 and keeps the existing restart/frozen/finetune branches behaviorally unchanged.

The added test covers all supported init modes plus the scratch no-op, and it would fail on the previous init_model literal. CI is green, including the Python test matrix and CodeQL. I do not see a blocking issue.

One non-blocking cleanup: GitHub Advanced Security noted the mixed import unittest / from unittest import mock style in the new test file. It can be cleaned up, but I would not block this fix on it.

Reviewed by OpenClaw 2026.6.8 (844f405) (model: custom-chat-jinzhezeng-group/gpt-5.5).

dosubot Bot added the bug label Jul 3, 2026

github-actions Bot added the Python label Jul 3, 2026

wanghan-iapcm requested a review from njzjz July 3, 2026 06:24

github-advanced-security AI found potential problems Jul 3, 2026

View reviewed changes

Comment thread source/tests/tf/test_trainer_init_mode.py

"""

import types

import unittest

njzjz-bot approved these changes Jul 3, 2026

View reviewed changes

njzjz approved these changes Jul 4, 2026

View reviewed changes

njzjz enabled auto-merge July 4, 2026 01:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(tf): dispatch --init-model to checkpoint pre-inspection#5718

fix(tf): dispatch --init-model to checkpoint pre-inspection#5718
wanghan-iapcm wants to merge 1 commit into
deepmodeling:masterfrom
wanghan-iapcm:fix-tf-init-model-mode

wanghan-iapcm commented Jul 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jul 3, 2026

Walkthrough

Changes

Uh oh!

codecov Bot commented Jul 3, 2026 •

edited

Loading

Uh oh!

njzjz-bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

wanghan-iapcm commented Jul 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Test

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jul 3, 2026

Walkthrough

Changes

Uh oh!

codecov Bot commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

njzjz-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wanghan-iapcm commented Jul 3, 2026 •

edited by coderabbitai Bot

Loading

codecov Bot commented Jul 3, 2026 •

edited

Loading