From 308552a177076054b1f5fe690cae198bcb1ae5af Mon Sep 17 00:00:00 2001
From: OpenClaw <njzjz2008@gmail.com>
Date: Tue, 24 Feb 2026 07:54:00 +0000
Subject: [PATCH 1/6] doc: add theory section for DPA-2 descriptor

Authored by OpenClaw (model: gpt-5.3-codex)
---
 doc/model/dpa2.md | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/doc/model/dpa2.md b/doc/model/dpa2.md
index 466a4de4f2..60e8f9a5f1 100644
--- a/doc/model/dpa2.md
+++ b/doc/model/dpa2.md
@@ -8,6 +8,42 @@ The DPA-2 model implementation. See [DPA-2 paper](https://doi.org/10.1038/s41524
 
 Training example: `examples/water/dpa2/input_torch_medium.json`, see [README](../../examples/water/dpa2/README.md) for inputs in different levels.
 
+## Theory
+
+DPA-2 is an attention-based descriptor designed to learn expressive local atomic representations while preserving the physical symmetries required by interatomic potentials.
+
+### Local environment and representation
+
+For each central atom $\alpha$, neighbors $\beta \in \mathcal{N}(\alpha)$ are selected within a cutoff radius. DPA-2 encodes each local environment through geometric features (relative coordinates and derived invariants) and element/type information.
+
+The descriptor is built hierarchically:
+
+1. **Initial embedding**: geometric and type features are projected into latent channels.
+1. **Attention-based interaction**: stacked attention layers model neighbor-neighbor and center-neighbor correlations in the local environment.
+1. **Output descriptor**: atom-wise latent features after the final layer are used as descriptor outputs for downstream fitting/model components.
+
+### Attention-based message passing
+
+DPA-2 uses attention to aggregate neighbor information with data-dependent weights. Conceptually, each layer computes:
+
+```math
+\mathbf{h}_\alpha^{(l+1)} = \mathbf{h}_\alpha^{(l)} + \mathrm{Attn}^{(l)}\left(\mathbf{h}_\alpha^{(l)}, \{\mathbf{h}_\beta^{(l)}\}_{\beta\in\mathcal{N}(\alpha)}, \{\mathbf{g}_{\alpha\beta}\}_{\beta\in\mathcal{N}(\alpha)}\right)
+```
+
+where $\mathbf{h}$ denotes latent node features and $\mathbf{g}_{\alpha\beta}$ denotes geometry-conditioned pair features. Residual updates enable stable deep stacking.
+
+### Physical symmetries
+
+DPA-2 is constructed to satisfy key symmetry requirements of atomistic modeling:
+
+1. **Translational invariance**: only relative coordinates are used.
+1. **Rotational behavior**: internal geometric constructions are designed so that final scalar descriptor channels used downstream are rotationally invariant.
+1. **Permutational invariance**: atoms of the same species are treated identically under permutation (re-labeling) operations.
+
+### Multi-task training context
+
+DPA-2 is commonly used in a multi-task setting. The descriptor is shared, while task-specific heads/objectives are handled downstream. See [Multi-task training](../train/multi-task-training.md) for framework details.
+
 ## Requirements of installation {{ pytorch_icon }}
 
 If one wants to run the DPA-2 model on LAMMPS, the customized OP library for the Python interface must be installed when [freezing the model](../freeze/freeze.md).

From 5a63f28ae0bb8f442ee6579162b4fd9582d531bc Mon Sep 17 00:00:00 2001
From: OpenClaw <njzjz2008@gmail.com>
Date: Tue, 24 Feb 2026 07:58:12 +0000
Subject: [PATCH 2/6] doc: align DPA-2 theory section with paper terminology

Authored by OpenClaw (model: gpt-5.3-codex)
---
 doc/model/dpa2.md | 44 +++++++++++++++++++++++++++-----------------
 1 file changed, 27 insertions(+), 17 deletions(-)

diff --git a/doc/model/dpa2.md b/doc/model/dpa2.md
index 60e8f9a5f1..ecf5b93d2f 100644
--- a/doc/model/dpa2.md
+++ b/doc/model/dpa2.md
@@ -10,39 +10,49 @@ Training example: `examples/water/dpa2/input_torch_medium.json`, see [README](..
 
 ## Theory
 
-DPA-2 is an attention-based descriptor designed to learn expressive local atomic representations while preserving the physical symmetries required by interatomic potentials.
+DPA-2 is an attention-based descriptor architecture proposed for large atomic models (LAMs); see the [DPA-2 paper](https://doi.org/10.1038/s41524-024-01493-2).
 
-### Local environment and representation
+At a high level, DPA-2 builds local representations with three coupled channels (paper notation):
 
-For each central atom $\alpha$, neighbors $\beta \in \mathcal{N}(\alpha)$ are selected within a cutoff radius. DPA-2 encodes each local environment through geometric features (relative coordinates and derived invariants) and element/type information.
+- **Single-atom channel** $\mathbf{f}_lpha$
+- **Rotationally invariant pair channel** $\mathbf{g}_{lphaeta}$
+- **Rotationally equivariant pair channel** $\mathbf{h}_{lphaeta}$
 
-The descriptor is built hierarchically:
+for neighbors $eta\in\mathcal{N}(lpha)$ within cutoffs.
 
-1. **Initial embedding**: geometric and type features are projected into latent channels.
-1. **Attention-based interaction**: stacked attention layers model neighbor-neighbor and center-neighbor correlations in the local environment.
-1. **Output descriptor**: atom-wise latent features after the final layer are used as descriptor outputs for downstream fitting/model components.
+### Descriptor pipeline
 
-### Attention-based message passing
+The descriptor follows two main stages:
 
-DPA-2 uses attention to aggregate neighbor information with data-dependent weights. Conceptually, each layer computes:
+1. **repinit (representation initializer)**
+   - Initializes and fuses type and geometry information from local environments.
+2. **repformer (representation transformer)**
+   - Stacked message-passing layers that update $\mathbf{f}$ and $\mathbf{g}$ channels through convolution/symmetrization/MLP and attention-style interactions.
+
+The final descriptor is formed from learned single-atom representations and then passed to downstream fitting/model components.
+
+### Message passing intuition
+
+DPA-2 updates local features layer-by-layer with residual connections. Conceptually, each layer performs neighborhood aggregation using geometry-conditioned interactions:
 
 ```math
-\mathbf{h}_\alpha^{(l+1)} = \mathbf{h}_\alpha^{(l)} + \mathrm{Attn}^{(l)}\left(\mathbf{h}_\alpha^{(l)}, \{\mathbf{h}_\beta^{(l)}\}_{\beta\in\mathcal{N}(\alpha)}, \{\mathbf{g}_{\alpha\beta}\}_{\beta\in\mathcal{N}(\alpha)}\right)
+\mathbf{h}_lpha^{(l+1)} = \mathbf{h}_lpha^{(l)} + \mathrm{MP}^{(l)}\left(\mathbf{h}_lpha^{(l)}, \{\mathbf{h}_eta^{(l)}\}_{eta\in\mathcal{N}(lpha)}, \{\mathbf{g}_{lphaeta}\}_{eta\in\mathcal{N}(lpha)}ight)
 ```
 
-where $\mathbf{h}$ denotes latent node features and $\mathbf{g}_{\alpha\beta}$ denotes geometry-conditioned pair features. Residual updates enable stable deep stacking.
+where $\mathrm{MP}^{(l)}$ denotes the layer-specific message-passing operator.
 
-### Physical symmetries
+### Physical properties
 
-DPA-2 is constructed to satisfy key symmetry requirements of atomistic modeling:
+Consistent with the DPA-2 design goals in the paper, the model family is built to satisfy:
 
-1. **Translational invariance**: only relative coordinates are used.
-1. **Rotational behavior**: internal geometric constructions are designed so that final scalar descriptor channels used downstream are rotationally invariant.
-1. **Permutational invariance**: atoms of the same species are treated identically under permutation (re-labeling) operations.
+1. **Translational invariance** (depends on relative coordinates)
+1. **Rotational and permutational symmetry requirements**
+1. **Conservative formulation** when used in energy models (forces/virials from energy gradients)
+1. **Smoothness up to second-order derivatives**
 
 ### Multi-task training context
 
-DPA-2 is commonly used in a multi-task setting. The descriptor is shared, while task-specific heads/objectives are handled downstream. See [Multi-task training](../train/multi-task-training.md) for framework details.
+DPA-2 is designed for multi-task pre-training with a shared descriptor and task-specific downstream heads. See [Multi-task training](../train/multi-task-training.md) for workflow details.
 
 ## Requirements of installation {{ pytorch_icon }}
 

From 556759fa274007018952f35c90c2464a35fb3f3c Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Tue, 24 Feb 2026 08:00:07 +0000
Subject: [PATCH 3/6] [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
---
 doc/model/dpa2.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/doc/model/dpa2.md b/doc/model/dpa2.md
index ecf5b93d2f..148a058cea 100644
--- a/doc/model/dpa2.md
+++ b/doc/model/dpa2.md
@@ -26,7 +26,7 @@ The descriptor follows two main stages:
 
 1. **repinit (representation initializer)**
    - Initializes and fuses type and geometry information from local environments.
-2. **repformer (representation transformer)**
+1. **repformer (representation transformer)**
    - Stacked message-passing layers that update $\mathbf{f}$ and $\mathbf{g}$ channels through convolution/symmetrization/MLP and attention-style interactions.
 
 The final descriptor is formed from learned single-atom representations and then passed to downstream fitting/model components.
@@ -36,7 +36,8 @@ The final descriptor is formed from learned single-atom representations and then
 DPA-2 updates local features layer-by-layer with residual connections. Conceptually, each layer performs neighborhood aggregation using geometry-conditioned interactions:
 
 ```math
-\mathbf{h}_lpha^{(l+1)} = \mathbf{h}_lpha^{(l)} + \mathrm{MP}^{(l)}\left(\mathbf{h}_lpha^{(l)}, \{\mathbf{h}_eta^{(l)}\}_{eta\in\mathcal{N}(lpha)}, \{\mathbf{g}_{lphaeta}\}_{eta\in\mathcal{N}(lpha)}ight)
+\mathbf{h}_lpha^{(l+1)} = \mathbf{h}_lpha^{(l)} + \mathrm{MP}^{(l)}\left(\mathbf{h}_lpha^{(l)}, \{\mathbf{h}_eta^{(l)}\}_{eta\in\mathcal{N}(lpha)}, \{\mathbf{g}_{lphaeta}\}_{eta\in\mathcal{N}(lpha)}
+ight)
 ```
 
 where $\mathrm{MP}^{(l)}$ denotes the layer-specific message-passing operator.

From 31964aee5310f05d267d4548544efd26a2116e4b Mon Sep 17 00:00:00 2001
From: OpenClaw <njzjz2008@gmail.com>
Date: Tue, 24 Feb 2026 08:08:37 +0000
Subject: [PATCH 4/6] doc: fix DPA-2 theory math rendering and minor heading
 typos

Authored by OpenClaw (model: gpt-5.3-codex)
---
 doc/model/dpa2.md | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/doc/model/dpa2.md b/doc/model/dpa2.md
index 148a058cea..eeef8451b2 100644
--- a/doc/model/dpa2.md
+++ b/doc/model/dpa2.md
@@ -14,11 +14,11 @@ DPA-2 is an attention-based descriptor architecture proposed for large atomic mo
 
 At a high level, DPA-2 builds local representations with three coupled channels (paper notation):
 
-- **Single-atom channel** $\mathbf{f}_lpha$
-- **Rotationally invariant pair channel** $\mathbf{g}_{lphaeta}$
-- **Rotationally equivariant pair channel** $\mathbf{h}_{lphaeta}$
+- **Single-atom channel** $\mathbf{f}_\alpha$
+- **Rotationally invariant pair channel** $\mathbf{g}_{\alpha\beta}$
+- **Rotationally equivariant pair channel** $\mathbf{h}_{\alpha\beta}$
 
-for neighbors $eta\in\mathcal{N}(lpha)$ within cutoffs.
+for neighbors $\beta\in\mathcal{N}(\alpha)$ within cutoffs.
 
 ### Descriptor pipeline
 
@@ -26,18 +26,17 @@ The descriptor follows two main stages:
 
 1. **repinit (representation initializer)**
    - Initializes and fuses type and geometry information from local environments.
-1. **repformer (representation transformer)**
+2. **repformer (representation transformer)**
    - Stacked message-passing layers that update $\mathbf{f}$ and $\mathbf{g}$ channels through convolution/symmetrization/MLP and attention-style interactions.
 
 The final descriptor is formed from learned single-atom representations and then passed to downstream fitting/model components.
 
-### Message passing intuition
+### Message-passing intuition
 
 DPA-2 updates local features layer-by-layer with residual connections. Conceptually, each layer performs neighborhood aggregation using geometry-conditioned interactions:
 
 ```math
-\mathbf{h}_lpha^{(l+1)} = \mathbf{h}_lpha^{(l)} + \mathrm{MP}^{(l)}\left(\mathbf{h}_lpha^{(l)}, \{\mathbf{h}_eta^{(l)}\}_{eta\in\mathcal{N}(lpha)}, \{\mathbf{g}_{lphaeta}\}_{eta\in\mathcal{N}(lpha)}
-ight)
+\mathbf{h}_\alpha^{(l+1)} = \mathbf{h}_\alpha^{(l)} + \mathrm{MP}^{(l)}\left(\mathbf{h}_\alpha^{(l)}, \{\mathbf{h}_\beta^{(l)}\}_{\beta\in\mathcal{N}(\alpha)}, \{\mathbf{g}_{\alpha\beta}\}_{\beta\in\mathcal{N}(\alpha)}\right)
 ```
 
 where $\mathrm{MP}^{(l)}$ denotes the layer-specific message-passing operator.
@@ -65,7 +64,7 @@ If one runs LAMMPS with MPI, the customized OP library for the C++ interface sho
 If one runs LAMMPS with MPI and CUDA devices, it is recommended to compile the customized OP library for the C++ interface with a [CUDA-Aware MPI](https://developer.nvidia.com/mpi-solutions-gpus) library and CUDA,
 otherwise the communication between GPU cards falls back to the slower CPU implementation.
 
-## Limiations of the JAX backend with LAMMPS {{ jax_icon }}
+## Limitations of the JAX backend with LAMMPS {{ jax_icon }}
 
 When using the JAX backend, 2 or more MPI ranks are not supported. One must set `map` to `yes` using the [`atom_modify`](https://docs.lammps.org/atom_modify.html) command.
 

From a5a4922ef9b281210e6270204863547e99f5c62f Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Tue, 24 Feb 2026 08:11:46 +0000
Subject: [PATCH 5/6] [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
---
 doc/model/dpa2.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/model/dpa2.md b/doc/model/dpa2.md
index eeef8451b2..d4c75e0509 100644
--- a/doc/model/dpa2.md
+++ b/doc/model/dpa2.md
@@ -26,7 +26,7 @@ The descriptor follows two main stages:
 
 1. **repinit (representation initializer)**
    - Initializes and fuses type and geometry information from local environments.
-2. **repformer (representation transformer)**
+1. **repformer (representation transformer)**
    - Stacked message-passing layers that update $\mathbf{f}$ and $\mathbf{g}$ channels through convolution/symmetrization/MLP and attention-style interactions.
 
 The final descriptor is formed from learned single-atom representations and then passed to downstream fitting/model components.

From 508a11c4e7267225b6498ef6336a3643ef08260d Mon Sep 17 00:00:00 2001
From: njzjz-bot <48687836+njzjz-bot@users.noreply.github.com>
Date: Wed, 25 Feb 2026 05:30:25 +0000
Subject: [PATCH 6/6] doc: align repformer channel description with MP equation

Authored by OpenClaw (model: gpt-5.3-codex)
---
 doc/model/dpa2.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/model/dpa2.md b/doc/model/dpa2.md
index d4c75e0509..1c8dd420ee 100644
--- a/doc/model/dpa2.md
+++ b/doc/model/dpa2.md
@@ -27,7 +27,7 @@ The descriptor follows two main stages:
 1. **repinit (representation initializer)**
    - Initializes and fuses type and geometry information from local environments.
 1. **repformer (representation transformer)**
-   - Stacked message-passing layers that update $\mathbf{f}$ and $\mathbf{g}$ channels through convolution/symmetrization/MLP and attention-style interactions.
+   - Stacked message-passing layers that update $\mathbf{f}$, $\mathbf{g}$, and per-atom representations $\mathbf{h}$ through convolution/symmetrization/MLP and attention-style interactions.
 
 The final descriptor is formed from learned single-atom representations and then passed to downstream fitting/model components.