fix: unify compute_output_stats across dp, pt, and pd backends by wanghan-iapcm · Pull Request #5267 · deepmodeling/deepmd-kit

wanghan-iapcm · 2026-02-25T07:51:08Z

Fix elif → if for index classification: A system can have both global and atomic labels simultaneously. The old elif logic meant a system with both find_atom_energy and find_energy would only be indexed for atomic, silently dropping its global label.
Changed to two independent if checks in all three backends.
Add global_sampled_idx/atomic_sampled_idx parameters: compute_output_stats_global and compute_output_stats_atomic in pt and pd backends now accept precomputed index dicts (matching dpmodel's signature) instead of re-scanning systems internally.
Support mixed type in dpmodel: compute_output_stats_global in dpmodel now checks for real_natoms_vec (previously hardcoded natoms_key = "natoms").
Apply atom_exclude_types mask to natoms in dpmodel: dpmodel was missing the exclude-type mask on natoms that pt/pd already had.
Fix in-place mutation of input data: All three backends were mutating sampled[i]["natoms"] (or real_natoms_vec) in-place when atom_exclude_types was present. Now the mask is applied to a local copy, leaving the caller's data untouched.
Add cross-backend consistency tests: New test file source/tests/consistent/utils/test_stat.py with 48 tests covering dp-vs-pt and dp-vs-pd consistency for compute_output_stats_global, compute_output_stats_atomic, and the top-level
compute_output_stats, plus no-mutation verification.

Summary by CodeRabbit

New Features
- Support excluding atom types from global natoms calculations.
- Allow a system to be counted in both global and atomic sampling simultaneously.
Refactor
- Switched statistics assembly to index-based gathering for robust mixed-type handling.
- Standardized numeric assembly/reshaping across backends for consistent merging.
Tests
- Added comprehensive cross-backend consistency tests (global, atomic, mixed types, exclusions).
Chores
- Minor unpacking/cleanup to remove unused fields.

…toms before computing output bias

…kends 1. elif → if in compute_output_stats: Systems with both find_atom_<key> and find_<key> now go into both atomic_sampled_idx and global_sampled_idx 2. pt and pd: compute_output_stats_global updated to accept global_sampled_idx parameter (matching dpmodel's signature) and use precomputed indices instead of inline filtering. Also converts to numpy early via to_numpy_array during gathering, then uses np.concatenate instead of torch.cat/paddle.concat + to_numpy_array. 3. pt and pd: compute_output_stats_atomic updated to accept atomic_sampled_idx parameter (matching dpmodel's signature) with early return for empty indices, and same numpy-first gathering pattern.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a53d7d5d50

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

deepmd/pt/utils/stat.py

deepmd/pd/utils/stat.py

coderabbitai · 2026-02-25T08:01:04Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f47c36b and 6173cf0.

📒 Files selected for processing (4)

deepmd/dpmodel/utils/stat.py
deepmd/pd/utils/stat.py
deepmd/pt/utils/stat.py
source/tests/consistent/utils/test_stat.py

📝 Walkthrough

Walkthrough

Refactors statistics computation across DP, PT, and PD backends to use index-based gathering, allow systems to be counted in both global and atomic paths, add per-sample atom exclusion masking (AtomExcludeMask) and mixed-natoms support, rename internal helpers, and add cross-backend tests for consistency and no-mutation.

Changes

Cohort / File(s)	Summary
DP stats `deepmd/dpmodel/utils/stat.py`	Switch atomic/global branching to allow overlap (elif→if); import and apply `AtomExcludeMask` for per-sample exclusion; rename internal helpers to `_compute_output_stats_global` / `_compute_output_stats_atomic`; support mixed `real_natoms_vec`.
PD (Paddle) stats `deepmd/pd/utils/stat.py`	Refactor to index-based gathering using `global_sampled_idx` / `atomic_sampled_idx`; rename helpers to `_compute_output_stats_global` / `_compute_output_stats_atomic`; build outputs/natoms from indices; apply atom exclusion masking; replace paddle.concat with numpy flows and adjust reshaping/early-return logic.
PT (PyTorch) stats `deepmd/pt/utils/stat.py`	Mirror PD changes: index-based selection, new `_compute_output_stats_global` / `_compute_output_stats_atomic` signatures, per-sample exclusion masking for natoms, numpy concatenation and reshaping updates, adjusted early-return behavior.
EnvMatStat unpacking `deepmd/dpmodel/utils/env_mat_stat.py`, `deepmd/pd/utils/env_mat_stat.py`	Remove unused `natoms` from per-system unpacking in EnvMatStatSe.iter; no behavior change.
Tests & test utils `source/tests/consistent/utils/__init__.py`, `source/tests/consistent/utils/test_stat.py`	Add comprehensive cross-backend tests and data builders/converters verifying DP/PT/PD consistency (global, atomic, mixed, exclude-types) and input no-mutation.
Project metadata `manifest_file`, `requirements.txt`, `pyproject.toml`	Minor packaging/requirement adjustments referenced in diff summaries.

Sequence Diagram(s)

sequenceDiagram
    participant DataGen as DataGenerator
    participant Backend as Backend (DP/PT/PD)
    participant Stats as StatsHelpers
    participant Mask as AtomExcludeMask
    participant Out as StatsOutput

    DataGen->>Backend: emit per-system payloads (+ sampled indices)
    Backend->>Stats: call compute_output_stats (collect indices)
    Stats->>Stats: build global_sampled_idx / atomic_sampled_idx
    Stats->>Mask: request per-sample exclusion mask (if types present)
    Mask-->>Stats: exclusion boolean mask per-sample
    Stats->>Stats: gather outputs/natoms by indices, apply mask
    Stats->>Out: produce bias/std and merged arrays (global/atomic)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

fix(pt): fix global bias stat with different natom #3944: related changes to global-stat natoms aggregation for frames with differing atom counts.
feat(pt_expt): atomic model #5220: overlaps with refactors/renames and AtomExcludeMask usage in dpmodel stat utilities.
pd: support dpa3 with paddle backend #4701: related refactors to PD stat functions and index/reshape handling.

Suggested reviewers

njzjz
iProzd

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main objective of the PR: unifying the compute_output_stats function across three different backends (dp, pt, and pd).
Docstring Coverage	✅ Passed	Docstring coverage is 93.33% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

deepmd/pd/utils/stat.py (1)
372-389: Consider refactoring parameter order for defensive API design.

The new index parameters (global_sampled_idx, atomic_sampled_idx) were inserted before existing optional parameters, which introduces a pattern risk for positional argument callers. While all current call sites (including lines 372-389) use positional arguments correctly and the functions are internal to their modules, this pattern could cause silent breakage if these functions are ever made part of a public API.

For defensive programming, consider either making the index parameters keyword-only (using * separator) or appending them at the end of the parameter list.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pd/utils/stat.py` around lines 372 - 389, The new index parameters
were inserted before existing optional args, which risks silent breakage for
positional callers; update the function signatures for
compute_output_stats_global and compute_output_stats_atomic to either make the
index params keyword-only (add a '*' before their parameter names) or move those
index parameters to the end of the parameter list so all existing
optional/positional parameters keep their order, then update any internal call
sites if needed to use keywords for those index args.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@deepmd/pd/utils/stat.py`:
- Around line 372-389: The new index parameters were inserted before existing
optional args, which risks silent breakage for positional callers; update the
function signatures for compute_output_stats_global and
compute_output_stats_atomic to either make the index params keyword-only (add a
'*' before their parameter names) or move those index parameters to the end of
the parameter list so all existing optional/positional parameters keep their
order, then update any internal call sites if needed to use keywords for those
index args.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65eea4b and a53d7d5.

📒 Files selected for processing (5)

deepmd/dpmodel/utils/stat.py
deepmd/pd/utils/stat.py
deepmd/pt/utils/stat.py
source/tests/consistent/utils/__init__.py
source/tests/consistent/utils/test_stat.py

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepmd/pd/utils/stat.py`:
- Around line 429-433: The early return currently treats global_sampled_idx ==
None the same as explicitly empty, breaking callers that rely on internal index
discovery; change the condition so we only short-circuit when global_sampled_idx
is not None and all per-sample lists are empty (i.e., if global_sampled_idx is
None, proceed with the original discovery logic instead of returning {}, {}),
and apply the same fix to the second occurrence around lines 554–557; update the
checks that gate the "return {}, {}" to reference both "global_sampled_idx is
not None" and "all(len(v) == 0 for v in global_sampled_idx.values())" so prior
behavior is preserved.

In `@deepmd/pt/utils/stat.py`:
- Around line 429-433: The early return when global_sampled_idx is None (and
similarly for other *_sampled_idx usages around line 551) causes silent no-ops;
instead, when global_sampled_idx (or per-atom sampled idx params) is None, build
fallback index maps from the provided sampled data (e.g., the sampled
argument/variable used earlier) rather than returning empty dicts. Locate the
block that returns {}, {} and replace it with logic that constructs
global_sampled_idx (and the analogous per-type sampled_idx) by iterating over
sampled to collect indices for each key (preserving previous auto-discovery
behavior), and then continue with the existing stats computation; ensure
functions/variables referenced include global_sampled_idx, sampled, and any
per-type sampled_idx used at line ~551 so callers can omit the params safely.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a53d7d5 and f47c36b.

📒 Files selected for processing (4)

deepmd/dpmodel/utils/env_mat_stat.py
deepmd/pd/utils/env_mat_stat.py
deepmd/pd/utils/stat.py
deepmd/pt/utils/stat.py

deepmd/pd/utils/stat.py

deepmd/pt/utils/stat.py

codecov · 2026-02-25T08:49:29Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.00%. Comparing base (65eea4b) to head (6173cf0).
⚠️ Report is 10 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff            @@
##           master    #5267    +/-   ##
========================================
  Coverage   82.00%   82.00%            
========================================
  Files         750      750            
  Lines       75082    75213   +131     
  Branches     3615     3615            
========================================
+ Hits        61571    61679   +108     
- Misses      12347    12371    +24     
+ Partials     1164     1163     -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Han Wang added 5 commits February 25, 2026 13:28

compute_output_stats_global now applies atom_exclude_types mask to na…

b78ecef

…toms before computing output bias

support mixed type in dpmodel, add consistency test for backends

1d1ca09

does not change input data

1fe4507

add consistency test between dp and pd

a53d7d5

wanghan-iapcm requested a review from njzjz February 25, 2026 07:51

dosubot bot added the bug label Feb 25, 2026

rm unused natoms.

f47c36b

chatgpt-codex-connector bot reviewed Feb 25, 2026

View reviewed changes

deepmd/pt/utils/stat.py Show resolved Hide resolved

deepmd/pd/utils/stat.py Show resolved Hide resolved

github-actions bot added the Python label Feb 25, 2026

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

wanghan-iapcm mentioned this pull request Feb 25, 2026

feat(pt_expt): add dipole, polar, dos, property and dp-zbl models with cross-backend consistency tests #5260

Open

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

deepmd/pd/utils/stat.py Show resolved Hide resolved

deepmd/pt/utils/stat.py Show resolved Hide resolved

njzjz approved these changes Feb 25, 2026

View reviewed changes

add _ for helpers

6173cf0

wanghan-iapcm enabled auto-merge February 26, 2026 01:57

wanghan-iapcm added this pull request to the merge queue Feb 26, 2026

Merged via the queue into deepmodeling:master with commit abf9575 Feb 26, 2026
70 checks passed

wanghan-iapcm deleted the refact-stats branch February 26, 2026 19:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: unify compute_output_stats across dp, pt, and pd backends#5267

fix: unify compute_output_stats across dp, pt, and pd backends#5267
wanghan-iapcm merged 7 commits intodeepmodeling:masterfrom
wanghan-iapcm:refact-stats

wanghan-iapcm commented Feb 25, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Feb 25, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Feb 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wanghan-iapcm commented Feb 25, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wanghan-iapcm commented Feb 25, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 25, 2026 •

edited

Loading

codecov bot commented Feb 25, 2026 •

edited

Loading