fix: unify compute_output_stats across dp, pt, and pd backends#5267
fix: unify compute_output_stats across dp, pt, and pd backends#5267wanghan-iapcm merged 7 commits intodeepmodeling:masterfrom
Conversation
…toms before computing output bias
…kends 1. elif → if in compute_output_stats: Systems with both find_atom_<key> and find_<key> now go into both atomic_sampled_idx and global_sampled_idx 2. pt and pd: compute_output_stats_global updated to accept global_sampled_idx parameter (matching dpmodel's signature) and use precomputed indices instead of inline filtering. Also converts to numpy early via to_numpy_array during gathering, then uses np.concatenate instead of torch.cat/paddle.concat + to_numpy_array. 3. pt and pd: compute_output_stats_atomic updated to accept atomic_sampled_idx parameter (matching dpmodel's signature) with early return for empty indices, and same numpy-first gathering pattern.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a53d7d5d50
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: Repository UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughRefactors statistics computation across DP, PT, and PD backends to use index-based gathering, allow systems to be counted in both global and atomic paths, add per-sample atom exclusion masking (AtomExcludeMask) and mixed-natoms support, rename internal helpers, and add cross-backend tests for consistency and no-mutation. Changes
Sequence Diagram(s)sequenceDiagram
participant DataGen as DataGenerator
participant Backend as Backend (DP/PT/PD)
participant Stats as StatsHelpers
participant Mask as AtomExcludeMask
participant Out as StatsOutput
DataGen->>Backend: emit per-system payloads (+ sampled indices)
Backend->>Stats: call compute_output_stats (collect indices)
Stats->>Stats: build global_sampled_idx / atomic_sampled_idx
Stats->>Mask: request per-sample exclusion mask (if types present)
Mask-->>Stats: exclusion boolean mask per-sample
Stats->>Stats: gather outputs/natoms by indices, apply mask
Stats->>Out: produce bias/std and merged arrays (global/atomic)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
deepmd/pd/utils/stat.py (1)
372-389: Consider refactoring parameter order for defensive API design.The new index parameters (
global_sampled_idx,atomic_sampled_idx) were inserted before existing optional parameters, which introduces a pattern risk for positional argument callers. While all current call sites (including lines 372-389) use positional arguments correctly and the functions are internal to their modules, this pattern could cause silent breakage if these functions are ever made part of a public API.For defensive programming, consider either making the index parameters keyword-only (using
*separator) or appending them at the end of the parameter list.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pd/utils/stat.py` around lines 372 - 389, The new index parameters were inserted before existing optional args, which risks silent breakage for positional callers; update the function signatures for compute_output_stats_global and compute_output_stats_atomic to either make the index params keyword-only (add a '*' before their parameter names) or move those index parameters to the end of the parameter list so all existing optional/positional parameters keep their order, then update any internal call sites if needed to use keywords for those index args.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@deepmd/pd/utils/stat.py`:
- Around line 372-389: The new index parameters were inserted before existing
optional args, which risks silent breakage for positional callers; update the
function signatures for compute_output_stats_global and
compute_output_stats_atomic to either make the index params keyword-only (add a
'*' before their parameter names) or move those index parameters to the end of
the parameter list so all existing optional/positional parameters keep their
order, then update any internal call sites if needed to use keywords for those
index args.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
deepmd/dpmodel/utils/stat.pydeepmd/pd/utils/stat.pydeepmd/pt/utils/stat.pysource/tests/consistent/utils/__init__.pysource/tests/consistent/utils/test_stat.py
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deepmd/pd/utils/stat.py`:
- Around line 429-433: The early return currently treats global_sampled_idx ==
None the same as explicitly empty, breaking callers that rely on internal index
discovery; change the condition so we only short-circuit when global_sampled_idx
is not None and all per-sample lists are empty (i.e., if global_sampled_idx is
None, proceed with the original discovery logic instead of returning {}, {}),
and apply the same fix to the second occurrence around lines 554–557; update the
checks that gate the "return {}, {}" to reference both "global_sampled_idx is
not None" and "all(len(v) == 0 for v in global_sampled_idx.values())" so prior
behavior is preserved.
In `@deepmd/pt/utils/stat.py`:
- Around line 429-433: The early return when global_sampled_idx is None (and
similarly for other *_sampled_idx usages around line 551) causes silent no-ops;
instead, when global_sampled_idx (or per-atom sampled idx params) is None, build
fallback index maps from the provided sampled data (e.g., the sampled
argument/variable used earlier) rather than returning empty dicts. Locate the
block that returns {}, {} and replace it with logic that constructs
global_sampled_idx (and the analogous per-type sampled_idx) by iterating over
sampled to collect indices for each key (preserving previous auto-discovery
behavior), and then continue with the existing stats computation; ensure
functions/variables referenced include global_sampled_idx, sampled, and any
per-type sampled_idx used at line ~551 so callers can omit the params safely.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
deepmd/dpmodel/utils/env_mat_stat.pydeepmd/pd/utils/env_mat_stat.pydeepmd/pd/utils/stat.pydeepmd/pt/utils/stat.py
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #5267 +/- ##
========================================
Coverage 82.00% 82.00%
========================================
Files 750 750
Lines 75082 75213 +131
Branches 3615 3615
========================================
+ Hits 61571 61679 +108
- Misses 12347 12371 +24
+ Partials 1164 1163 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Changed to two independent if checks in all three backends.
compute_output_stats, plus no-mutation verification.
Summary by CodeRabbit
New Features
Refactor
Tests
Chores