Skip to content

fix: unify compute_output_stats across dp, pt, and pd backends#5267

Merged
wanghan-iapcm merged 7 commits intodeepmodeling:masterfrom
wanghan-iapcm:refact-stats
Feb 26, 2026
Merged

fix: unify compute_output_stats across dp, pt, and pd backends#5267
wanghan-iapcm merged 7 commits intodeepmodeling:masterfrom
wanghan-iapcm:refact-stats

Conversation

@wanghan-iapcm
Copy link
Collaborator

@wanghan-iapcm wanghan-iapcm commented Feb 25, 2026

  • Fix elif → if for index classification: A system can have both global and atomic labels simultaneously. The old elif logic meant a system with both find_atom_energy and find_energy would only be indexed for atomic, silently dropping its global label.
    Changed to two independent if checks in all three backends.
  • Add global_sampled_idx/atomic_sampled_idx parameters: compute_output_stats_global and compute_output_stats_atomic in pt and pd backends now accept precomputed index dicts (matching dpmodel's signature) instead of re-scanning systems internally.
  • Support mixed type in dpmodel: compute_output_stats_global in dpmodel now checks for real_natoms_vec (previously hardcoded natoms_key = "natoms").
  • Apply atom_exclude_types mask to natoms in dpmodel: dpmodel was missing the exclude-type mask on natoms that pt/pd already had.
  • Fix in-place mutation of input data: All three backends were mutating sampled[i]["natoms"] (or real_natoms_vec) in-place when atom_exclude_types was present. Now the mask is applied to a local copy, leaving the caller's data untouched.
  • Add cross-backend consistency tests: New test file source/tests/consistent/utils/test_stat.py with 48 tests covering dp-vs-pt and dp-vs-pd consistency for compute_output_stats_global, compute_output_stats_atomic, and the top-level
    compute_output_stats, plus no-mutation verification.

Summary by CodeRabbit

  • New Features

    • Support excluding atom types from global natoms calculations.
    • Allow a system to be counted in both global and atomic sampling simultaneously.
  • Refactor

    • Switched statistics assembly to index-based gathering for robust mixed-type handling.
    • Standardized numeric assembly/reshaping across backends for consistent merging.
  • Tests

    • Added comprehensive cross-backend consistency tests (global, atomic, mixed types, exclusions).
  • Chores

    • Minor unpacking/cleanup to remove unused fields.

Han Wang added 5 commits February 25, 2026 13:28
…kends

  1. elif → if in compute_output_stats: Systems with both find_atom_<key> and find_<key> now go into both atomic_sampled_idx and global_sampled_idx
  2. pt and pd: compute_output_stats_global updated to accept global_sampled_idx parameter (matching dpmodel's signature) and use precomputed indices instead of inline filtering. Also converts to numpy early via to_numpy_array during gathering, then uses  np.concatenate instead of torch.cat/paddle.concat + to_numpy_array.
  3. pt and pd: compute_output_stats_atomic updated to accept atomic_sampled_idx parameter (matching dpmodel's signature) with early return for empty indices, and same numpy-first gathering pattern.
@wanghan-iapcm wanghan-iapcm requested a review from njzjz February 25, 2026 07:51
@dosubot dosubot bot added the bug label Feb 25, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a53d7d5d50

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f47c36b and 6173cf0.

📒 Files selected for processing (4)
  • deepmd/dpmodel/utils/stat.py
  • deepmd/pd/utils/stat.py
  • deepmd/pt/utils/stat.py
  • source/tests/consistent/utils/test_stat.py

📝 Walkthrough

Walkthrough

Refactors statistics computation across DP, PT, and PD backends to use index-based gathering, allow systems to be counted in both global and atomic paths, add per-sample atom exclusion masking (AtomExcludeMask) and mixed-natoms support, rename internal helpers, and add cross-backend tests for consistency and no-mutation.

Changes

Cohort / File(s) Summary
DP stats
deepmd/dpmodel/utils/stat.py
Switch atomic/global branching to allow overlap (elif→if); import and apply AtomExcludeMask for per-sample exclusion; rename internal helpers to _compute_output_stats_global / _compute_output_stats_atomic; support mixed real_natoms_vec.
PD (Paddle) stats
deepmd/pd/utils/stat.py
Refactor to index-based gathering using global_sampled_idx / atomic_sampled_idx; rename helpers to _compute_output_stats_global / _compute_output_stats_atomic; build outputs/natoms from indices; apply atom exclusion masking; replace paddle.concat with numpy flows and adjust reshaping/early-return logic.
PT (PyTorch) stats
deepmd/pt/utils/stat.py
Mirror PD changes: index-based selection, new _compute_output_stats_global / _compute_output_stats_atomic signatures, per-sample exclusion masking for natoms, numpy concatenation and reshaping updates, adjusted early-return behavior.
EnvMatStat unpacking
deepmd/dpmodel/utils/env_mat_stat.py, deepmd/pd/utils/env_mat_stat.py
Remove unused natoms from per-system unpacking in EnvMatStatSe.iter; no behavior change.
Tests & test utils
source/tests/consistent/utils/__init__.py, source/tests/consistent/utils/test_stat.py
Add comprehensive cross-backend tests and data builders/converters verifying DP/PT/PD consistency (global, atomic, mixed, exclude-types) and input no-mutation.
Project metadata
manifest_file, requirements.txt, pyproject.toml
Minor packaging/requirement adjustments referenced in diff summaries.

Sequence Diagram(s)

sequenceDiagram
    participant DataGen as DataGenerator
    participant Backend as Backend (DP/PT/PD)
    participant Stats as StatsHelpers
    participant Mask as AtomExcludeMask
    participant Out as StatsOutput

    DataGen->>Backend: emit per-system payloads (+ sampled indices)
    Backend->>Stats: call compute_output_stats (collect indices)
    Stats->>Stats: build global_sampled_idx / atomic_sampled_idx
    Stats->>Mask: request per-sample exclusion mask (if types present)
    Mask-->>Stats: exclusion boolean mask per-sample
    Stats->>Stats: gather outputs/natoms by indices, apply mask
    Stats->>Out: produce bias/std and merged arrays (global/atomic)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • njzjz
  • iProzd
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main objective of the PR: unifying the compute_output_stats function across three different backends (dp, pt, and pd).
Docstring Coverage ✅ Passed Docstring coverage is 93.33% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
deepmd/pd/utils/stat.py (1)

372-389: Consider refactoring parameter order for defensive API design.

The new index parameters (global_sampled_idx, atomic_sampled_idx) were inserted before existing optional parameters, which introduces a pattern risk for positional argument callers. While all current call sites (including lines 372-389) use positional arguments correctly and the functions are internal to their modules, this pattern could cause silent breakage if these functions are ever made part of a public API.

For defensive programming, consider either making the index parameters keyword-only (using * separator) or appending them at the end of the parameter list.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pd/utils/stat.py` around lines 372 - 389, The new index parameters
were inserted before existing optional args, which risks silent breakage for
positional callers; update the function signatures for
compute_output_stats_global and compute_output_stats_atomic to either make the
index params keyword-only (add a '*' before their parameter names) or move those
index parameters to the end of the parameter list so all existing
optional/positional parameters keep their order, then update any internal call
sites if needed to use keywords for those index args.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@deepmd/pd/utils/stat.py`:
- Around line 372-389: The new index parameters were inserted before existing
optional args, which risks silent breakage for positional callers; update the
function signatures for compute_output_stats_global and
compute_output_stats_atomic to either make the index params keyword-only (add a
'*' before their parameter names) or move those index parameters to the end of
the parameter list so all existing optional/positional parameters keep their
order, then update any internal call sites if needed to use keywords for those
index args.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65eea4b and a53d7d5.

📒 Files selected for processing (5)
  • deepmd/dpmodel/utils/stat.py
  • deepmd/pd/utils/stat.py
  • deepmd/pt/utils/stat.py
  • source/tests/consistent/utils/__init__.py
  • source/tests/consistent/utils/test_stat.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepmd/pd/utils/stat.py`:
- Around line 429-433: The early return currently treats global_sampled_idx ==
None the same as explicitly empty, breaking callers that rely on internal index
discovery; change the condition so we only short-circuit when global_sampled_idx
is not None and all per-sample lists are empty (i.e., if global_sampled_idx is
None, proceed with the original discovery logic instead of returning {}, {}),
and apply the same fix to the second occurrence around lines 554–557; update the
checks that gate the "return {}, {}" to reference both "global_sampled_idx is
not None" and "all(len(v) == 0 for v in global_sampled_idx.values())" so prior
behavior is preserved.

In `@deepmd/pt/utils/stat.py`:
- Around line 429-433: The early return when global_sampled_idx is None (and
similarly for other *_sampled_idx usages around line 551) causes silent no-ops;
instead, when global_sampled_idx (or per-atom sampled idx params) is None, build
fallback index maps from the provided sampled data (e.g., the sampled
argument/variable used earlier) rather than returning empty dicts. Locate the
block that returns {}, {} and replace it with logic that constructs
global_sampled_idx (and the analogous per-type sampled_idx) by iterating over
sampled to collect indices for each key (preserving previous auto-discovery
behavior), and then continue with the existing stats computation; ensure
functions/variables referenced include global_sampled_idx, sampled, and any
per-type sampled_idx used at line ~551 so callers can omit the params safely.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a53d7d5 and f47c36b.

📒 Files selected for processing (4)
  • deepmd/dpmodel/utils/env_mat_stat.py
  • deepmd/pd/utils/env_mat_stat.py
  • deepmd/pd/utils/stat.py
  • deepmd/pt/utils/stat.py

@codecov
Copy link

codecov bot commented Feb 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.00%. Comparing base (65eea4b) to head (6173cf0).
⚠️ Report is 10 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff            @@
##           master    #5267    +/-   ##
========================================
  Coverage   82.00%   82.00%            
========================================
  Files         750      750            
  Lines       75082    75213   +131     
  Branches     3615     3615            
========================================
+ Hits        61571    61679   +108     
- Misses      12347    12371    +24     
+ Partials     1164     1163     -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@wanghan-iapcm wanghan-iapcm added this pull request to the merge queue Feb 26, 2026
Merged via the queue into deepmodeling:master with commit abf9575 Feb 26, 2026
70 checks passed
@wanghan-iapcm wanghan-iapcm deleted the refact-stats branch February 26, 2026 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants