Refactor/merge openmp#7446
Conversation
|
You may first remove unnecessary files, then add tests to show the effects of code refactoring. |
56538dc to
b0fccfa
Compare
|
Done. Removed unnecessary files (Planners/, Results/, Test/, opt_logs/). The PR now contains only 26 source files across source/source_md/ and source/source_esolver/, plus unit tests in source/source_md/test/. Please let me know if additional performance tests are needed.
|
b0fccfa to
f4ebb41
Compare
Add #pragma omp parallel for to major per-atom loops in MD module, enabling multi-threaded execution for NEP/DPMD potentials and thermostat/integrator operations. Scope (23 files): - source/source_md/: md_base, md_func, fire, msst, nhchain, verlet, run_md, md_statistics.h - source/source_esolver/: esolver_nep, esolver_dp - source/source_md/test/: 7 unit tests + md_test_fixture.h Strategy: schedule(static) with if(nat>=256), reduction clauses, atomic/critical for shared accumulators. LJ esolver excluded (upstream refactored to UnitCellLite API). Rebased onto deepmodeling/develop. Co-Authored-By: Claude <noreply@anthropic.com>
4483678 to
72fa195
Compare
|
Updated the PR: - Removed unnecessary non-code files (Planners/, Results/, Test/, opt_logs/)
esolver interfaces + unit tests. All #pragma omp directives follow the same strategy: schedule(static) with ▎ if(nat>=256), reduction clauses for energy/virial, atomic/critical for shared accumulators. Please let me know if any further changes are needed. |
|
All 16 CI checks are now passing. |
PR: OpenMP Parallel Optimization for ABACUS MD Module and ML Potential Interfaces (NEP/DPMD/LJ)
Reminder
Linked Issue
Fix #...
Unit Tests and/or Case Tests for my changes
Existing Unit Tests Pass:
MODULE_MD_LJ_pot(6 tests)MODULE_MD_func(7 tests)MODULE_MD_fireMODULE_MD_verletMODULE_MD_nhcMODULE_MD_msstMODULE_MD_lgvTest Infrastructure:
source/source_md/test/md_test_fixture.h) to eliminate duplicated SetUp/TearDown across 6 test files.Microbenchmark Verification:
Test/openmp_nep_basic_benchmark.cppand companion scripts).max_abs_diff = 0).1e-10to1e-8level due to summation order changes — expected and acceptable for MD trajectories.What's changed?
This PR integrates OpenMP parallelization from three feature branches (
refactor/md-factory,refactor/parallel-optimize,refactor/md-openmp-remainder) into the ABACUS MD module and ML potential interfaces. 22 parallel loops or worksharing regions are added across 12 source files (+3934/−342 lines total).1. MD Base Loops (
source/source_md/)MD_base::update_pos()md_base.cpp#pragma omp parallel for schedule(static)MD_base::update_vel()md_base.cpp#pragma omp parallel for schedule(static)kinetic_energy()md_func.cppreduction(+:ke)force_virial()force copymd_func.cpptemp_vector()md_func.cpprescale_vel()md_func.cppschedule(static)All loops use
if (natom >= 256)to skip parallel overhead for small systems.2. NEP Interface (
source/source_esolver/esolver_nep.cpp/.h)atom_type_index/atom_local_indexindex caches for flatiat-based parallel loops.nep.compute()external library call remains serial.3. DPMD Interface (
source/source_esolver/esolver_dp.cpp/.h)iat → (it, ia)index caches.dp_cell,dp_coord,dp_model_force,dp_model_virial) to avoid repeated allocations.dp.compute()external library call and 3×3 virial copy-back remain serial.4. Thermostat and Barostat (
source/source_md/)Verletthermalize()velocity rescalingverlet.cppMSSTrescale()shock-direction velocity scalingmsst.cppMSSTvel_sum()velocity norm reductionmsst.cppMSSTpropagate_vel()per-atom velocity propagationmsst.cppNoseHooverparticle_thermo()final velocity scalingnhchain.cppNoseHoovervel_baro()barostat velocity updatenhchain.cppThermostat chain recurrence integration and cell dilation remain serial.
5. FIRE Algorithm (
source/source_md/fire.cpp)FIRE::check_fire()parallelized in three phases:P,sumforce,normvelP <= 0branch)Scalar state updates (
alpha,negative_count,dt) remain serial.6. LJ Interface (
source/source_esolver/esolver_lj.cpp/.h)iat-based loop.schedule(dynamic, 32)to handle neighbor-count imbalance.atomic(energy) andcritical(virial) reduction at thread exit — no per-neighbor locks.7. Code Quality Refactors
calc_kinetic_state()/calc_stress_state()(md_func.h,md_statistics.h).new/delete→std::unique_ptr(run_md.cpp).Performance Summary (Microbenchmark, 8 threads, 2M atoms, Xeon Platinum 8163)
update_posupdate_velkinetic_energytemp_vectorcoord_fillenergy_sumforce_fillvirial_sumcoord_fillforce_copythermalizerescalepropagate_velparticle_thermocheck_fire(mix)runnercore loop*NEP virial 14.24× includes loop reorganization benefits beyond pure 8-thread scaling.
Known Limitations & Future Work
__NEP,deepmd).nat >= 256is an empirical uniform threshold; per-kernel tuning (64/128/256/512) is recommended.schedule(dynamic, 32)vsstaticand optimal chunk size have not been systematically benchmarked across different neighbor distributions.Any changes of core modules? (ignore if not applicable)
The MD ESolver interface layer (
esolver_nep.cpp,esolver_dp.cpp,esolver_lj.cpp) is modified to add index caches and parallel worksharing constructs. No changes to the ESolver base class virtual function signatures. All external library calls (nep.compute(),dp.compute()) remain serial and their calling convention is unchanged.