diff --git a/doc/model/dpa2.md b/doc/model/dpa2.md index 466a4de4f2..1c8dd420ee 100644 --- a/doc/model/dpa2.md +++ b/doc/model/dpa2.md @@ -8,6 +8,52 @@ The DPA-2 model implementation. See [DPA-2 paper](https://doi.org/10.1038/s41524 Training example: `examples/water/dpa2/input_torch_medium.json`, see [README](../../examples/water/dpa2/README.md) for inputs in different levels. +## Theory + +DPA-2 is an attention-based descriptor architecture proposed for large atomic models (LAMs); see the [DPA-2 paper](https://doi.org/10.1038/s41524-024-01493-2). + +At a high level, DPA-2 builds local representations with three coupled channels (paper notation): + +- **Single-atom channel** $\mathbf{f}_\alpha$ +- **Rotationally invariant pair channel** $\mathbf{g}_{\alpha\beta}$ +- **Rotationally equivariant pair channel** $\mathbf{h}_{\alpha\beta}$ + +for neighbors $\beta\in\mathcal{N}(\alpha)$ within cutoffs. + +### Descriptor pipeline + +The descriptor follows two main stages: + +1. **repinit (representation initializer)** + - Initializes and fuses type and geometry information from local environments. +1. **repformer (representation transformer)** + - Stacked message-passing layers that update $\mathbf{f}$, $\mathbf{g}$, and per-atom representations $\mathbf{h}$ through convolution/symmetrization/MLP and attention-style interactions. + +The final descriptor is formed from learned single-atom representations and then passed to downstream fitting/model components. + +### Message-passing intuition + +DPA-2 updates local features layer-by-layer with residual connections. Conceptually, each layer performs neighborhood aggregation using geometry-conditioned interactions: + +```math +\mathbf{h}_\alpha^{(l+1)} = \mathbf{h}_\alpha^{(l)} + \mathrm{MP}^{(l)}\left(\mathbf{h}_\alpha^{(l)}, \{\mathbf{h}_\beta^{(l)}\}_{\beta\in\mathcal{N}(\alpha)}, \{\mathbf{g}_{\alpha\beta}\}_{\beta\in\mathcal{N}(\alpha)}\right) +``` + +where $\mathrm{MP}^{(l)}$ denotes the layer-specific message-passing operator. + +### Physical properties + +Consistent with the DPA-2 design goals in the paper, the model family is built to satisfy: + +1. **Translational invariance** (depends on relative coordinates) +1. **Rotational and permutational symmetry requirements** +1. **Conservative formulation** when used in energy models (forces/virials from energy gradients) +1. **Smoothness up to second-order derivatives** + +### Multi-task training context + +DPA-2 is designed for multi-task pre-training with a shared descriptor and task-specific downstream heads. See [Multi-task training](../train/multi-task-training.md) for workflow details. + ## Requirements of installation {{ pytorch_icon }} If one wants to run the DPA-2 model on LAMMPS, the customized OP library for the Python interface must be installed when [freezing the model](../freeze/freeze.md). @@ -18,7 +64,7 @@ If one runs LAMMPS with MPI, the customized OP library for the C++ interface sho If one runs LAMMPS with MPI and CUDA devices, it is recommended to compile the customized OP library for the C++ interface with a [CUDA-Aware MPI](https://developer.nvidia.com/mpi-solutions-gpus) library and CUDA, otherwise the communication between GPU cards falls back to the slower CPU implementation. -## Limiations of the JAX backend with LAMMPS {{ jax_icon }} +## Limitations of the JAX backend with LAMMPS {{ jax_icon }} When using the JAX backend, 2 or more MPI ranks are not supported. One must set `map` to `yes` using the [`atom_modify`](https://docs.lammps.org/atom_modify.html) command.