From 308552a177076054b1f5fe690cae198bcb1ae5af Mon Sep 17 00:00:00 2001 From: OpenClaw Date: Tue, 24 Feb 2026 07:54:00 +0000 Subject: [PATCH 1/6] doc: add theory section for DPA-2 descriptor Authored by OpenClaw (model: gpt-5.3-codex) --- doc/model/dpa2.md | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/doc/model/dpa2.md b/doc/model/dpa2.md index 466a4de4f2..60e8f9a5f1 100644 --- a/doc/model/dpa2.md +++ b/doc/model/dpa2.md @@ -8,6 +8,42 @@ The DPA-2 model implementation. See [DPA-2 paper](https://doi.org/10.1038/s41524 Training example: `examples/water/dpa2/input_torch_medium.json`, see [README](../../examples/water/dpa2/README.md) for inputs in different levels. +## Theory + +DPA-2 is an attention-based descriptor designed to learn expressive local atomic representations while preserving the physical symmetries required by interatomic potentials. + +### Local environment and representation + +For each central atom $\alpha$, neighbors $\beta \in \mathcal{N}(\alpha)$ are selected within a cutoff radius. DPA-2 encodes each local environment through geometric features (relative coordinates and derived invariants) and element/type information. + +The descriptor is built hierarchically: + +1. **Initial embedding**: geometric and type features are projected into latent channels. +1. **Attention-based interaction**: stacked attention layers model neighbor-neighbor and center-neighbor correlations in the local environment. +1. **Output descriptor**: atom-wise latent features after the final layer are used as descriptor outputs for downstream fitting/model components. + +### Attention-based message passing + +DPA-2 uses attention to aggregate neighbor information with data-dependent weights. Conceptually, each layer computes: + +```math +\mathbf{h}_\alpha^{(l+1)} = \mathbf{h}_\alpha^{(l)} + \mathrm{Attn}^{(l)}\left(\mathbf{h}_\alpha^{(l)}, \{\mathbf{h}_\beta^{(l)}\}_{\beta\in\mathcal{N}(\alpha)}, \{\mathbf{g}_{\alpha\beta}\}_{\beta\in\mathcal{N}(\alpha)}\right) +``` + +where $\mathbf{h}$ denotes latent node features and $\mathbf{g}_{\alpha\beta}$ denotes geometry-conditioned pair features. Residual updates enable stable deep stacking. + +### Physical symmetries + +DPA-2 is constructed to satisfy key symmetry requirements of atomistic modeling: + +1. **Translational invariance**: only relative coordinates are used. +1. **Rotational behavior**: internal geometric constructions are designed so that final scalar descriptor channels used downstream are rotationally invariant. +1. **Permutational invariance**: atoms of the same species are treated identically under permutation (re-labeling) operations. + +### Multi-task training context + +DPA-2 is commonly used in a multi-task setting. The descriptor is shared, while task-specific heads/objectives are handled downstream. See [Multi-task training](../train/multi-task-training.md) for framework details. + ## Requirements of installation {{ pytorch_icon }} If one wants to run the DPA-2 model on LAMMPS, the customized OP library for the Python interface must be installed when [freezing the model](../freeze/freeze.md). From 5a63f28ae0bb8f442ee6579162b4fd9582d531bc Mon Sep 17 00:00:00 2001 From: OpenClaw Date: Tue, 24 Feb 2026 07:58:12 +0000 Subject: [PATCH 2/6] doc: align DPA-2 theory section with paper terminology Authored by OpenClaw (model: gpt-5.3-codex) --- doc/model/dpa2.md | 44 +++++++++++++++++++++++++++----------------- 1 file changed, 27 insertions(+), 17 deletions(-) diff --git a/doc/model/dpa2.md b/doc/model/dpa2.md index 60e8f9a5f1..ecf5b93d2f 100644 --- a/doc/model/dpa2.md +++ b/doc/model/dpa2.md @@ -10,39 +10,49 @@ Training example: `examples/water/dpa2/input_torch_medium.json`, see [README](.. ## Theory -DPA-2 is an attention-based descriptor designed to learn expressive local atomic representations while preserving the physical symmetries required by interatomic potentials. +DPA-2 is an attention-based descriptor architecture proposed for large atomic models (LAMs); see the [DPA-2 paper](https://doi.org/10.1038/s41524-024-01493-2). -### Local environment and representation +At a high level, DPA-2 builds local representations with three coupled channels (paper notation): -For each central atom $\alpha$, neighbors $\beta \in \mathcal{N}(\alpha)$ are selected within a cutoff radius. DPA-2 encodes each local environment through geometric features (relative coordinates and derived invariants) and element/type information. +- **Single-atom channel** $\mathbf{f}_lpha$ +- **Rotationally invariant pair channel** $\mathbf{g}_{lphaeta}$ +- **Rotationally equivariant pair channel** $\mathbf{h}_{lphaeta}$ -The descriptor is built hierarchically: +for neighbors $eta\in\mathcal{N}(lpha)$ within cutoffs. -1. **Initial embedding**: geometric and type features are projected into latent channels. -1. **Attention-based interaction**: stacked attention layers model neighbor-neighbor and center-neighbor correlations in the local environment. -1. **Output descriptor**: atom-wise latent features after the final layer are used as descriptor outputs for downstream fitting/model components. +### Descriptor pipeline -### Attention-based message passing +The descriptor follows two main stages: -DPA-2 uses attention to aggregate neighbor information with data-dependent weights. Conceptually, each layer computes: +1. **repinit (representation initializer)** + - Initializes and fuses type and geometry information from local environments. +2. **repformer (representation transformer)** + - Stacked message-passing layers that update $\mathbf{f}$ and $\mathbf{g}$ channels through convolution/symmetrization/MLP and attention-style interactions. + +The final descriptor is formed from learned single-atom representations and then passed to downstream fitting/model components. + +### Message passing intuition + +DPA-2 updates local features layer-by-layer with residual connections. Conceptually, each layer performs neighborhood aggregation using geometry-conditioned interactions: ```math -\mathbf{h}_\alpha^{(l+1)} = \mathbf{h}_\alpha^{(l)} + \mathrm{Attn}^{(l)}\left(\mathbf{h}_\alpha^{(l)}, \{\mathbf{h}_\beta^{(l)}\}_{\beta\in\mathcal{N}(\alpha)}, \{\mathbf{g}_{\alpha\beta}\}_{\beta\in\mathcal{N}(\alpha)}\right) +\mathbf{h}_lpha^{(l+1)} = \mathbf{h}_lpha^{(l)} + \mathrm{MP}^{(l)}\left(\mathbf{h}_lpha^{(l)}, \{\mathbf{h}_eta^{(l)}\}_{eta\in\mathcal{N}(lpha)}, \{\mathbf{g}_{lphaeta}\}_{eta\in\mathcal{N}(lpha)} ight) ``` -where $\mathbf{h}$ denotes latent node features and $\mathbf{g}_{\alpha\beta}$ denotes geometry-conditioned pair features. Residual updates enable stable deep stacking. +where $\mathrm{MP}^{(l)}$ denotes the layer-specific message-passing operator. -### Physical symmetries +### Physical properties -DPA-2 is constructed to satisfy key symmetry requirements of atomistic modeling: +Consistent with the DPA-2 design goals in the paper, the model family is built to satisfy: -1. **Translational invariance**: only relative coordinates are used. -1. **Rotational behavior**: internal geometric constructions are designed so that final scalar descriptor channels used downstream are rotationally invariant. -1. **Permutational invariance**: atoms of the same species are treated identically under permutation (re-labeling) operations. +1. **Translational invariance** (depends on relative coordinates) +1. **Rotational and permutational symmetry requirements** +1. **Conservative formulation** when used in energy models (forces/virials from energy gradients) +1. **Smoothness up to second-order derivatives** ### Multi-task training context -DPA-2 is commonly used in a multi-task setting. The descriptor is shared, while task-specific heads/objectives are handled downstream. See [Multi-task training](../train/multi-task-training.md) for framework details. +DPA-2 is designed for multi-task pre-training with a shared descriptor and task-specific downstream heads. See [Multi-task training](../train/multi-task-training.md) for workflow details. ## Requirements of installation {{ pytorch_icon }} From 556759fa274007018952f35c90c2464a35fb3f3c Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Tue, 24 Feb 2026 08:00:07 +0000 Subject: [PATCH 3/6] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- doc/model/dpa2.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/doc/model/dpa2.md b/doc/model/dpa2.md index ecf5b93d2f..148a058cea 100644 --- a/doc/model/dpa2.md +++ b/doc/model/dpa2.md @@ -26,7 +26,7 @@ The descriptor follows two main stages: 1. **repinit (representation initializer)** - Initializes and fuses type and geometry information from local environments. -2. **repformer (representation transformer)** +1. **repformer (representation transformer)** - Stacked message-passing layers that update $\mathbf{f}$ and $\mathbf{g}$ channels through convolution/symmetrization/MLP and attention-style interactions. The final descriptor is formed from learned single-atom representations and then passed to downstream fitting/model components. @@ -36,7 +36,8 @@ The final descriptor is formed from learned single-atom representations and then DPA-2 updates local features layer-by-layer with residual connections. Conceptually, each layer performs neighborhood aggregation using geometry-conditioned interactions: ```math -\mathbf{h}_lpha^{(l+1)} = \mathbf{h}_lpha^{(l)} + \mathrm{MP}^{(l)}\left(\mathbf{h}_lpha^{(l)}, \{\mathbf{h}_eta^{(l)}\}_{eta\in\mathcal{N}(lpha)}, \{\mathbf{g}_{lphaeta}\}_{eta\in\mathcal{N}(lpha)} ight) +\mathbf{h}_lpha^{(l+1)} = \mathbf{h}_lpha^{(l)} + \mathrm{MP}^{(l)}\left(\mathbf{h}_lpha^{(l)}, \{\mathbf{h}_eta^{(l)}\}_{eta\in\mathcal{N}(lpha)}, \{\mathbf{g}_{lphaeta}\}_{eta\in\mathcal{N}(lpha)} +ight) ``` where $\mathrm{MP}^{(l)}$ denotes the layer-specific message-passing operator. From 31964aee5310f05d267d4548544efd26a2116e4b Mon Sep 17 00:00:00 2001 From: OpenClaw Date: Tue, 24 Feb 2026 08:08:37 +0000 Subject: [PATCH 4/6] doc: fix DPA-2 theory math rendering and minor heading typos Authored by OpenClaw (model: gpt-5.3-codex) --- doc/model/dpa2.md | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/doc/model/dpa2.md b/doc/model/dpa2.md index 148a058cea..eeef8451b2 100644 --- a/doc/model/dpa2.md +++ b/doc/model/dpa2.md @@ -14,11 +14,11 @@ DPA-2 is an attention-based descriptor architecture proposed for large atomic mo At a high level, DPA-2 builds local representations with three coupled channels (paper notation): -- **Single-atom channel** $\mathbf{f}_lpha$ -- **Rotationally invariant pair channel** $\mathbf{g}_{lphaeta}$ -- **Rotationally equivariant pair channel** $\mathbf{h}_{lphaeta}$ +- **Single-atom channel** $\mathbf{f}_\alpha$ +- **Rotationally invariant pair channel** $\mathbf{g}_{\alpha\beta}$ +- **Rotationally equivariant pair channel** $\mathbf{h}_{\alpha\beta}$ -for neighbors $eta\in\mathcal{N}(lpha)$ within cutoffs. +for neighbors $\beta\in\mathcal{N}(\alpha)$ within cutoffs. ### Descriptor pipeline @@ -26,18 +26,17 @@ The descriptor follows two main stages: 1. **repinit (representation initializer)** - Initializes and fuses type and geometry information from local environments. -1. **repformer (representation transformer)** +2. **repformer (representation transformer)** - Stacked message-passing layers that update $\mathbf{f}$ and $\mathbf{g}$ channels through convolution/symmetrization/MLP and attention-style interactions. The final descriptor is formed from learned single-atom representations and then passed to downstream fitting/model components. -### Message passing intuition +### Message-passing intuition DPA-2 updates local features layer-by-layer with residual connections. Conceptually, each layer performs neighborhood aggregation using geometry-conditioned interactions: ```math -\mathbf{h}_lpha^{(l+1)} = \mathbf{h}_lpha^{(l)} + \mathrm{MP}^{(l)}\left(\mathbf{h}_lpha^{(l)}, \{\mathbf{h}_eta^{(l)}\}_{eta\in\mathcal{N}(lpha)}, \{\mathbf{g}_{lphaeta}\}_{eta\in\mathcal{N}(lpha)} -ight) +\mathbf{h}_\alpha^{(l+1)} = \mathbf{h}_\alpha^{(l)} + \mathrm{MP}^{(l)}\left(\mathbf{h}_\alpha^{(l)}, \{\mathbf{h}_\beta^{(l)}\}_{\beta\in\mathcal{N}(\alpha)}, \{\mathbf{g}_{\alpha\beta}\}_{\beta\in\mathcal{N}(\alpha)}\right) ``` where $\mathrm{MP}^{(l)}$ denotes the layer-specific message-passing operator. @@ -65,7 +64,7 @@ If one runs LAMMPS with MPI, the customized OP library for the C++ interface sho If one runs LAMMPS with MPI and CUDA devices, it is recommended to compile the customized OP library for the C++ interface with a [CUDA-Aware MPI](https://developer.nvidia.com/mpi-solutions-gpus) library and CUDA, otherwise the communication between GPU cards falls back to the slower CPU implementation. -## Limiations of the JAX backend with LAMMPS {{ jax_icon }} +## Limitations of the JAX backend with LAMMPS {{ jax_icon }} When using the JAX backend, 2 or more MPI ranks are not supported. One must set `map` to `yes` using the [`atom_modify`](https://docs.lammps.org/atom_modify.html) command. From a5a4922ef9b281210e6270204863547e99f5c62f Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Tue, 24 Feb 2026 08:11:46 +0000 Subject: [PATCH 5/6] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- doc/model/dpa2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/model/dpa2.md b/doc/model/dpa2.md index eeef8451b2..d4c75e0509 100644 --- a/doc/model/dpa2.md +++ b/doc/model/dpa2.md @@ -26,7 +26,7 @@ The descriptor follows two main stages: 1. **repinit (representation initializer)** - Initializes and fuses type and geometry information from local environments. -2. **repformer (representation transformer)** +1. **repformer (representation transformer)** - Stacked message-passing layers that update $\mathbf{f}$ and $\mathbf{g}$ channels through convolution/symmetrization/MLP and attention-style interactions. The final descriptor is formed from learned single-atom representations and then passed to downstream fitting/model components. From 508a11c4e7267225b6498ef6336a3643ef08260d Mon Sep 17 00:00:00 2001 From: njzjz-bot <48687836+njzjz-bot@users.noreply.github.com> Date: Wed, 25 Feb 2026 05:30:25 +0000 Subject: [PATCH 6/6] doc: align repformer channel description with MP equation Authored by OpenClaw (model: gpt-5.3-codex) --- doc/model/dpa2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/model/dpa2.md b/doc/model/dpa2.md index d4c75e0509..1c8dd420ee 100644 --- a/doc/model/dpa2.md +++ b/doc/model/dpa2.md @@ -27,7 +27,7 @@ The descriptor follows two main stages: 1. **repinit (representation initializer)** - Initializes and fuses type and geometry information from local environments. 1. **repformer (representation transformer)** - - Stacked message-passing layers that update $\mathbf{f}$ and $\mathbf{g}$ channels through convolution/symmetrization/MLP and attention-style interactions. + - Stacked message-passing layers that update $\mathbf{f}$, $\mathbf{g}$, and per-atom representations $\mathbf{h}$ through convolution/symmetrization/MLP and attention-style interactions. The final descriptor is formed from learned single-atom representations and then passed to downstream fitting/model components.