From e5e6d32017a2634e2321de93bb55c0fe9145e2f6 Mon Sep 17 00:00:00 2001
From: njzjz-bot <njzjz-bot@users.noreply.github.com>
Date: Tue, 24 Feb 2026 06:13:42 +0000
Subject: [PATCH 1/8] doc: add theory section to DPA3 documentation

Add a comprehensive theory section to the DPA3 descriptor documentation,
including:
- Line Graph Series (LiGS) construction and geometric interpretation
- Message passing mechanism with residual connections
- Descriptor construction and energy prediction formulas
- Physical symmetries (translational, rotational, permutational invariance
  and energy conservation)
- Default configuration explanation

Authored by OpenClaw (model: glm-5)
---
 doc/model/dpa3.md | 74 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/doc/model/dpa3.md b/doc/model/dpa3.md
index f62c71bb5e..e44250476c 100644
--- a/doc/model/dpa3.md
+++ b/doc/model/dpa3.md
@@ -15,6 +15,80 @@ Reference: [DPA3 paper](https://arxiv.org/abs/2506.01686).
 
 Training example: `examples/water/dpa3/input_torch.json`.
 
+## Theory
+
+DPA3 is a graph neural network operating on the Line Graph Series (LiGS) constructed from atomic configurations.
+
+### Line Graph Series (LiGS)
+
+Given an initial graph $G^{(1)}$ representing the atomic system, where atoms are vertices and pairs of neighboring atoms within a cutoff radius $r_c$ are edges, the line graph transform $\mathcal{L}$ constructs a new graph $G^{(2)} = \mathcal{L}(G^{(1)})$ by:
+1. Converting each edge in $G^{(1)}$ to a vertex in $G^{(2)}$
+2. Creating edges in $G^{(2)}$ between vertices whose corresponding edges in $G^{(1)}$ share a common vertex
+
+Recursively applying this transform generates a series of graphs $\{G^{(1)}, G^{(2)}, \ldots, G^{(K)}\}$, where $G^{(k)} = \mathcal{L}(G^{(k-1)})$. This sequence is called the Line Graph Series (LiGS) of order $K$.
+
+Geometrically, vertices in $G^{(1)}$, $G^{(2)}$, $G^{(3)}$, and $G^{(4)}$ correspond to atoms, bonds (pairs of atoms), angles (three atoms with two bonds sharing a common atom), and dihedral angles (four atoms with two angles sharing a common bond), respectively.
+
+### Message Passing on LiGS
+
+DPA3 performs message passing across all graphs in the LiGS. At layer $l$, the vertex and edge features on graph $G^{(k)}$ are denoted as $\mathbf{v}_\alpha^{(k,l)} \in \mathbb{R}^{d_v}$ and $\mathbf{e}_{\alpha\beta}^{(k,l)} \in \mathbb{R}^{d_e}$, where $\alpha$ and $\alpha\beta$ denote vertex and edge indices, and $d_v$, $d_e$ are feature dimensions.
+
+The feature update follows a recursive formulation with residual connections:
+
+**For $G^{(1)}$ (initial graph):**
+The vertex features are updated through self-message and symmetrization:
+```math
+\mathbf{v}_\alpha^{(1,l+1)} = \mathbf{v}_\alpha^{(1,l)} + \text{Update}^{(1)}\left(\mathbf{v}_\alpha^{(1,l)}, \{\mathbf{e}_{\alpha\beta}^{(1,l)}\}_{\beta \in \mathcal{N}(\alpha)}\right)
+```
+
+**For $G^{(k)}$ with $k > 1$:**
+The vertex feature of $G^{(k)}$ is identical to the edge feature of $G^{(k-1)}$. This identity eliminates redundant storage:
+```math
+\mathbf{v}_\alpha^{(k,l)} = \mathbf{e}_{\alpha}^{(k-1,l)}
+```
+
+The edge features are updated based on messages from connected vertices:
+```math
+\mathbf{e}_{\alpha\beta}^{(k,l+1)} = \mathbf{e}_{\alpha\beta}^{(k,l)} + \text{Update}^{(k)}\left(\mathbf{e}_{\alpha\beta}^{(k,l)}, \mathbf{v}_\alpha^{(k,l)}, \mathbf{v}_\beta^{(k,l)}\right)
+```
+
+### Descriptor Construction
+
+The final vertex features of $G^{(1)}$ serve as the descriptor representing the local environment of each atom:
+```math
+\mathcal{D}^i = \mathbf{v}_i^{(1,L)}
+```
+where $L$ is the total number of layers.
+
+For multi-task training, the descriptor is augmented with dataset encoding (typically a one-hot vector) and passed through a fitting network to predict atomic energies:
+```math
+E_i = \mathcal{N}_{\text{fit}}(\mathcal{D}^i \oplus \mathbf{d}_{\text{dataset}})
+```
+The total system energy is the sum of atomic contributions:
+```math
+E = \sum_i E_i
+```
+
+### Physical Symmetries
+
+DPA3 respects all physical symmetries of the potential energy surface:
+
+1. **Translational invariance**: The model depends only on relative coordinates $\mathbf{r}_{ij} = \mathbf{r}_j - \mathbf{r}_i$, not absolute positions.
+
+2. **Rotational invariance**: The descriptor is constructed from scalar features that are invariant under global rotations.
+
+3. **Permutational invariance**: Atoms of the same chemical species are treated identically, respecting quantum statistics.
+
+4. **Energy conservation**: Forces are derived from energy gradients:
+```math
+\mathbf{F}_i = -\frac{\partial E}{\partial \mathbf{r}_i}
+```
+Virials are similarly derived from cell tensor gradients, ensuring the model is conservative and suitable for molecular dynamics simulations.
+
+### Default Configuration
+
+Based on extensive hyperparameter tests, DPA3 uses LiGS order $K=2$ as the default, which provides optimal balance between accuracy and computational cost. The model supports scaling through increasing the number of layers $L$ (e.g., DPA3-L3, DPA3-L6, DPA3-L12, DPA3-L24).
+
 ## Hyperparameter tests
 
 We systematically conducted DPA3 training on six representative DFT datasets (available at [AIS-Square](https://www.aissquare.com/datasets/detail?pageType=datasets&name=DPA3_hyperparameter_search&id=316)):

From fe443f1dcd3b0cbccc3699c6d5c99672514e2a51 Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Tue, 24 Feb 2026 06:21:25 +0000
Subject: [PATCH 2/8] [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
---
 doc/model/dpa3.md | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/doc/model/dpa3.md b/doc/model/dpa3.md
index e44250476c..e2bf22b9d9 100644
--- a/doc/model/dpa3.md
+++ b/doc/model/dpa3.md
@@ -22,8 +22,9 @@ DPA3 is a graph neural network operating on the Line Graph Series (LiGS) constru
 ### Line Graph Series (LiGS)
 
 Given an initial graph $G^{(1)}$ representing the atomic system, where atoms are vertices and pairs of neighboring atoms within a cutoff radius $r_c$ are edges, the line graph transform $\mathcal{L}$ constructs a new graph $G^{(2)} = \mathcal{L}(G^{(1)})$ by:
+
 1. Converting each edge in $G^{(1)}$ to a vertex in $G^{(2)}$
-2. Creating edges in $G^{(2)}$ between vertices whose corresponding edges in $G^{(1)}$ share a common vertex
+1. Creating edges in $G^{(2)}$ between vertices whose corresponding edges in $G^{(1)}$ share a common vertex
 
 Recursively applying this transform generates a series of graphs $\{G^{(1)}, G^{(2)}, \ldots, G^{(K)}\}$, where $G^{(k)} = \mathcal{L}(G^{(k-1)})$. This sequence is called the Line Graph Series (LiGS) of order $K$.
 
@@ -37,17 +38,20 @@ The feature update follows a recursive formulation with residual connections:
 
 **For $G^{(1)}$ (initial graph):**
 The vertex features are updated through self-message and symmetrization:
+
 ```math
 \mathbf{v}_\alpha^{(1,l+1)} = \mathbf{v}_\alpha^{(1,l)} + \text{Update}^{(1)}\left(\mathbf{v}_\alpha^{(1,l)}, \{\mathbf{e}_{\alpha\beta}^{(1,l)}\}_{\beta \in \mathcal{N}(\alpha)}\right)
 ```
 
 **For $G^{(k)}$ with $k > 1$:**
 The vertex feature of $G^{(k)}$ is identical to the edge feature of $G^{(k-1)}$. This identity eliminates redundant storage:
+
 ```math
 \mathbf{v}_\alpha^{(k,l)} = \mathbf{e}_{\alpha}^{(k-1,l)}
 ```
 
 The edge features are updated based on messages from connected vertices:
+
 ```math
 \mathbf{e}_{\alpha\beta}^{(k,l+1)} = \mathbf{e}_{\alpha\beta}^{(k,l)} + \text{Update}^{(k)}\left(\mathbf{e}_{\alpha\beta}^{(k,l)}, \mathbf{v}_\alpha^{(k,l)}, \mathbf{v}_\beta^{(k,l)}\right)
 ```
@@ -55,16 +59,21 @@ The edge features are updated based on messages from connected vertices:
 ### Descriptor Construction
 
 The final vertex features of $G^{(1)}$ serve as the descriptor representing the local environment of each atom:
+
 ```math
 \mathcal{D}^i = \mathbf{v}_i^{(1,L)}
 ```
+
 where $L$ is the total number of layers.
 
 For multi-task training, the descriptor is augmented with dataset encoding (typically a one-hot vector) and passed through a fitting network to predict atomic energies:
+
 ```math
 E_i = \mathcal{N}_{\text{fit}}(\mathcal{D}^i \oplus \mathbf{d}_{\text{dataset}})
 ```
+
 The total system energy is the sum of atomic contributions:
+
 ```math
 E = \sum_i E_i
 ```
@@ -75,14 +84,16 @@ DPA3 respects all physical symmetries of the potential energy surface:
 
 1. **Translational invariance**: The model depends only on relative coordinates $\mathbf{r}_{ij} = \mathbf{r}_j - \mathbf{r}_i$, not absolute positions.
 
-2. **Rotational invariance**: The descriptor is constructed from scalar features that are invariant under global rotations.
+1. **Rotational invariance**: The descriptor is constructed from scalar features that are invariant under global rotations.
 
-3. **Permutational invariance**: Atoms of the same chemical species are treated identically, respecting quantum statistics.
+1. **Permutational invariance**: Atoms of the same chemical species are treated identically, respecting quantum statistics.
+
+1. **Energy conservation**: Forces are derived from energy gradients:
 
-4. **Energy conservation**: Forces are derived from energy gradients:
 ```math
 \mathbf{F}_i = -\frac{\partial E}{\partial \mathbf{r}_i}
 ```
+
 Virials are similarly derived from cell tensor gradients, ensuring the model is conservative and suitable for molecular dynamics simulations.
 
 ### Default Configuration

From 9d6b27cc5369d91ffbfd739ba176568d1ebfa4f9 Mon Sep 17 00:00:00 2001
From: OpenClaw <njzjz2008@gmail.com>
Date: Tue, 24 Feb 2026 07:10:27 +0000
Subject: [PATCH 3/8] doc: address review comments in DPA3 theory section

Authored by OpenClaw (model: glm-5)
---
 doc/model/dpa3.md | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/doc/model/dpa3.md b/doc/model/dpa3.md
index e2bf22b9d9..76e137cbea 100644
--- a/doc/model/dpa3.md
+++ b/doc/model/dpa3.md
@@ -32,7 +32,7 @@ Geometrically, vertices in $G^{(1)}$, $G^{(2)}$, $G^{(3)}$, and $G^{(4)}$ corres
 
 ### Message Passing on LiGS
 
-DPA3 performs message passing across all graphs in the LiGS. At layer $l$, the vertex and edge features on graph $G^{(k)}$ are denoted as $\mathbf{v}_\alpha^{(k,l)} \in \mathbb{R}^{d_v}$ and $\mathbf{e}_{\alpha\beta}^{(k,l)} \in \mathbb{R}^{d_e}$, where $\alpha$ and $\alpha\beta$ denote vertex and edge indices, and $d_v$, $d_e$ are feature dimensions.
+DPA3 performs message passing across all graphs in the LiGS. At layer $l$, the vertex and edge features on graph $G^{(k)}$ are denoted as $\mathbf{v}_\alpha^{(k,l)} \in \mathbb{R}^{d_v^{(k)}}$ and $\mathbf{e}_{\alpha\beta}^{(k,l)} \in \mathbb{R}^{d_e^{(k)}}$, where $\alpha$ and $\alpha\beta$ denote vertex and edge indices, and $d_v^{(k)}$, $d_e^{(k)}$ are per-graph feature dimensions (for example, in `RepFlowArgs`: $d_v^{(1)}=n_\text{dim}$, $d_e^{(1)}=e_\text{dim}$, $d_v^{(2)}=e_\text{dim}$, and $d_e^{(2)}=a_\text{dim}$).
 
 The feature update follows a recursive formulation with residual connections:
 
@@ -45,9 +45,10 @@ The vertex features are updated through self-message and symmetrization:
 
 **For $G^{(k)}$ with $k > 1$:**
 The vertex feature of $G^{(k)}$ is identical to the edge feature of $G^{(k-1)}$. This identity eliminates redundant storage:
+Here $(\alpha,\beta)$ denotes the edge in $G^{(k-1)}$ corresponding to vertex $\alpha$ in $G^{(k)}$.
 
 ```math
-\mathbf{v}_\alpha^{(k,l)} = \mathbf{e}_{\alpha}^{(k-1,l)}
+\mathbf{v}_\alpha^{(k,l)} = \mathbf{e}_{\alpha\beta}^{(k-1,l)}
 ```
 
 The edge features are updated based on messages from connected vertices:
@@ -55,27 +56,29 @@ The edge features are updated based on messages from connected vertices:
 ```math
 \mathbf{e}_{\alpha\beta}^{(k,l+1)} = \mathbf{e}_{\alpha\beta}^{(k,l)} + \text{Update}^{(k)}\left(\mathbf{e}_{\alpha\beta}^{(k,l)}, \mathbf{v}_\alpha^{(k,l)}, \mathbf{v}_\beta^{(k,l)}\right)
 ```
+The same update mechanism applies to $G^{(1)}$ edge features $\mathbf{e}_{\alpha\beta}^{(1,l)}$, so they also evolve across layers and, via the $\mathbf{v}^{(2,l)}$-$\mathbf{e}^{(1,l)}$ identity, drive the updates on $G^{(2)}$.
+
 
 ### Descriptor Construction
 
 The final vertex features of $G^{(1)}$ serve as the descriptor representing the local environment of each atom:
 
 ```math
-\mathcal{D}^i = \mathbf{v}_i^{(1,L)}
+\mathcal{D}^\alpha = \mathbf{v}_\alpha^{(1,L)}
 ```
 
 where $L$ is the total number of layers.
 
-For multi-task training, the descriptor is augmented with dataset encoding (typically a one-hot vector) and passed through a fitting network to predict atomic energies:
+For multi-task training, the descriptor is augmented with dataset encoding (typically a one-hot vector) and passed through a fitting network to predict atomic energies ($\oplus$ denotes concatenation):
 
 ```math
-E_i = \mathcal{N}_{\text{fit}}(\mathcal{D}^i \oplus \mathbf{d}_{\text{dataset}})
+E_\alpha = \mathcal{N}_{\text{fit}}(\mathcal{D}^\alpha \oplus \mathbf{d}_{\text{dataset}})
 ```
 
 The total system energy is the sum of atomic contributions:
 
 ```math
-E = \sum_i E_i
+E = \sum_\alpha E_\alpha
 ```
 
 ### Physical Symmetries
@@ -84,9 +87,9 @@ DPA3 respects all physical symmetries of the potential energy surface:
 
 1. **Translational invariance**: The model depends only on relative coordinates $\mathbf{r}_{ij} = \mathbf{r}_j - \mathbf{r}_i$, not absolute positions.
 
-1. **Rotational invariance**: The descriptor is constructed from scalar features that are invariant under global rotations.
+1. **Rotational invariance**: The final descriptor is rotationally invariant; intermediate equivariant representations are used internally and contracted to produce invariant atomic features.
 
-1. **Permutational invariance**: Atoms of the same chemical species are treated identically, respecting quantum statistics.
+1. **Permutational invariance**: Atoms of the same chemical species are treated identically under permutation symmetry operations (re-labeling) of identical atoms.
 
 1. **Energy conservation**: Forces are derived from energy gradients:
 
@@ -98,7 +101,7 @@ Virials are similarly derived from cell tensor gradients, ensuring the model is
 
 ### Default Configuration
 
-Based on extensive hyperparameter tests, DPA3 uses LiGS order $K=2$ as the default, which provides optimal balance between accuracy and computational cost. The model supports scaling through increasing the number of layers $L$ (e.g., DPA3-L3, DPA3-L6, DPA3-L12, DPA3-L24).
+DPA3 uses LiGS order $K=2$ as the default configuration, which was found effective in prior work ([DPA3 paper](https://arxiv.org/abs/2506.01686)). The model supports scaling through increasing the number of layers $L$ (e.g., DPA3-L3, DPA3-L6, DPA3-L12, DPA3-L24).
 
 ## Hyperparameter tests
 

From 7790cd4210376023511fa834ed9769eef69b2d7f Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Tue, 24 Feb 2026 07:13:00 +0000
Subject: [PATCH 4/8] [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
---
 doc/model/dpa3.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/model/dpa3.md b/doc/model/dpa3.md
index 76e137cbea..052f50e356 100644
--- a/doc/model/dpa3.md
+++ b/doc/model/dpa3.md
@@ -56,8 +56,8 @@ The edge features are updated based on messages from connected vertices:
 ```math
 \mathbf{e}_{\alpha\beta}^{(k,l+1)} = \mathbf{e}_{\alpha\beta}^{(k,l)} + \text{Update}^{(k)}\left(\mathbf{e}_{\alpha\beta}^{(k,l)}, \mathbf{v}_\alpha^{(k,l)}, \mathbf{v}_\beta^{(k,l)}\right)
 ```
-The same update mechanism applies to $G^{(1)}$ edge features $\mathbf{e}_{\alpha\beta}^{(1,l)}$, so they also evolve across layers and, via the $\mathbf{v}^{(2,l)}$-$\mathbf{e}^{(1,l)}$ identity, drive the updates on $G^{(2)}$.
 
+The same update mechanism applies to $G^{(1)}$ edge features $\mathbf{e}_{\alpha\beta}^{(1,l)}$, so they also evolve across layers and, via the $\mathbf{v}^{(2,l)}$-$\mathbf{e}^{(1,l)}$ identity, drive the updates on $G^{(2)}$.
 
 ### Descriptor Construction
 

From b9e8c4b25d9e0250f8bd3a41286886dcaeb73e7c Mon Sep 17 00:00:00 2001
From: OpenClaw <njzjz2008@gmail.com>
Date: Tue, 24 Feb 2026 07:17:36 +0000
Subject: [PATCH 5/8] doc: polish DPA3 text and move edge-index clarification

Authored by OpenClaw (model: glm-5)
---
 doc/model/dpa3.md | 35 ++++++++++++++++++-----------------
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/doc/model/dpa3.md b/doc/model/dpa3.md
index 052f50e356..0f655b8b71 100644
--- a/doc/model/dpa3.md
+++ b/doc/model/dpa3.md
@@ -4,12 +4,12 @@
 **Supported backends**: PyTorch {{ pytorch_icon }}, JAX {{ jax_icon }}, DP {{ dpmodel_icon }}
 :::
 
-DPA3 is an advanced interatomic potential leveraging the message passing architecture.
-Designed as a large atomic model (LAM), DPA3 is tailored to integrate and simultaneously train on datasets from various disciplines,
-encompassing diverse chemical and materials systems across different research domains.
-Its model design ensures exceptional fitting accuracy and robust generalization both within and beyond the training domain.
-Furthermore, DPA3 maintains energy conservation and respects the physical symmetries of the potential energy surface,
-making it a dependable tool for a wide range of scientific applications.
+DPA3 is an advanced interatomic potential based on message passing.
+As a large atomic model (LAM), it is designed to integrate and jointly train on datasets from different domains,
+covering diverse chemical and materials systems.
+Its architecture provides high fitting accuracy and robust generalization both within and beyond the training domain.
+DPA3 also preserves energy conservation and the physical symmetries of the potential energy surface,
+making it a reliable model for a wide range of scientific applications.
 
 Reference: [DPA3 paper](https://arxiv.org/abs/2506.01686).
 
@@ -44,20 +44,22 @@ The vertex features are updated through self-message and symmetrization:
 ```
 
 **For $G^{(k)}$ with $k > 1$:**
-The vertex feature of $G^{(k)}$ is identical to the edge feature of $G^{(k-1)}$. This identity eliminates redundant storage:
-Here $(\alpha,\beta)$ denotes the edge in $G^{(k-1)}$ corresponding to vertex $\alpha$ in $G^{(k)}$.
+The vertex feature of $G^{(k)}$ is identical to the edge feature of $G^{(k-1)}$:
 
 ```math
 \mathbf{v}_\alpha^{(k,l)} = \mathbf{e}_{\alpha\beta}^{(k-1,l)}
 ```
 
+where $(\alpha,\beta)$ denotes the edge in $G^{(k-1)}$ corresponding to vertex $\alpha$ in $G^{(k)}$. This identity eliminates redundant storage.
+
 The edge features are updated based on messages from connected vertices:
 
 ```math
 \mathbf{e}_{\alpha\beta}^{(k,l+1)} = \mathbf{e}_{\alpha\beta}^{(k,l)} + \text{Update}^{(k)}\left(\mathbf{e}_{\alpha\beta}^{(k,l)}, \mathbf{v}_\alpha^{(k,l)}, \mathbf{v}_\beta^{(k,l)}\right)
 ```
+The same update mechanism also applies to $G^{(1)}$ edge features $\mathbf{e}_{\alpha\beta}^{(1,l)}$. Therefore, these features evolve across layers and, via the $\mathbf{v}^{(2,l)}$-$\mathbf{e}^{(1,l)}$ identity, drive the updates on $G^{(2)}$.
+
 
-The same update mechanism applies to $G^{(1)}$ edge features $\mathbf{e}_{\alpha\beta}^{(1,l)}$, so they also evolve across layers and, via the $\mathbf{v}^{(2,l)}$-$\mathbf{e}^{(1,l)}$ identity, drive the updates on $G^{(2)}$.
 
 ### Descriptor Construction
 
@@ -105,13 +107,12 @@ DPA3 uses LiGS order $K=2$ as the default configuration, which was found effecti
 
 ## Hyperparameter tests
 
-We systematically conducted DPA3 training on six representative DFT datasets (available at [AIS-Square](https://www.aissquare.com/datasets/detail?pageType=datasets&name=DPA3_hyperparameter_search&id=316)):
-metallic systems (`Alloy`, `AlMgCu`, `W`), covalent material (`Boron`), molecular system (`Drug`), and liquid water (`Water`).
-Under consistent training conditions (0.5M training steps, batch_size "auto:128"),
-we rigorously evaluated the impacts of some critical hyperparameters on validation accuracy.
+We systematically trained DPA3 on six representative DFT datasets (available at [AIS-Square](https://www.aissquare.com/datasets/detail?pageType=datasets&name=DPA3_hyperparameter_search&id=316)): metallic systems (`Alloy`, `AlMgCu`, `W`), a covalent material (`Boron`), a molecular system (`Drug`), and liquid water (`Water`).
+Under consistent training conditions (0.5M training steps, `batch_size` = `auto:128`),
+we evaluated the impact of key hyperparameters on validation accuracy.
 
-The comparative analysis focused on average RMSEs (Root Mean Square Error) for both energy, force and virial predictions across all six systems,
-with results tabulated below to guide scenario-specific hyperparameter selection:
+The comparative analysis focused on average RMSEs (Root Mean Square Error) for energy, force, and virial predictions across the six systems.
+The results are summarized below to guide scenario-specific hyperparameter selection:
 
 | Model            | comment         | nlayers | n_dim   | e_dim  | a_dim | e_sel   | a_sel  | start_lr | stop_lr  | loss prefactors           | rmse_e (meV/atom) | rmse_f (meV/Å) | rmse_v (meV/atom) | Training wall time (h) |
 | ---------------- | --------------- | ------- | ------- | ------ | ----- | ------- | ------ | -------- | -------- | ------------------------- | ----------------- | -------------- | ----------------- | ---------------------- |
@@ -123,10 +124,10 @@ with results tabulated below to guide scenario-specific hyperparameter selection
 |                  | Large sel       | 6       | 256     | 128    | 32    | **154** | **48** | 1e-3     | 3e-5     | 0.2\|20, 100\|60, 0.02\|1 | 4.76              | 78.4           | 40.2              | 31.8                   |
 | DPA2-L6 (medium) | Default         | 6       | -       | -      | -     | -       | -      | 1e-3     | 3.51e-08 | 0.02\|1, 1000\|1, 0.02\|1 | 12.12             | 109.3          | 83.1              | 12.2                   |
 
-The loss prefactors (0.2|20, 100|60, 0.02|1) correspond to (`start_pref_e`|`limit_pref_e`, `start_pref_f`|`limit_pref_f`, `start_pref_v`|`limit_pref_v`) respectively.
+The loss prefactors (0.2|20, 100|60, 0.02|1) correspond to (`start_pref_e`|`limit_pref_e`, `start_pref_f`|`limit_pref_f`, `start_pref_v`|`limit_pref_v`), respectively.
 Virial RMSEs were averaged exclusively for systems containing virial labels (`Alloy`, `AlMgCu`, `W`, and `Boron`).
 
-Note that we set `float32` in all DPA3 models, while `float64` in other models by default.
+Note that all DPA3 models use `float32`, while other models use `float64` by default.
 
 ## Requirements of installation from source code {{ pytorch_icon }} {{ paddle_icon }}
 

From e108287fb1b36befe12204104fee6d3a3bcc4479 Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Tue, 24 Feb 2026 07:21:10 +0000
Subject: [PATCH 6/8] [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
---
 doc/model/dpa3.md | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/doc/model/dpa3.md b/doc/model/dpa3.md
index 0f655b8b71..81c9647fab 100644
--- a/doc/model/dpa3.md
+++ b/doc/model/dpa3.md
@@ -57,9 +57,8 @@ The edge features are updated based on messages from connected vertices:
 ```math
 \mathbf{e}_{\alpha\beta}^{(k,l+1)} = \mathbf{e}_{\alpha\beta}^{(k,l)} + \text{Update}^{(k)}\left(\mathbf{e}_{\alpha\beta}^{(k,l)}, \mathbf{v}_\alpha^{(k,l)}, \mathbf{v}_\beta^{(k,l)}\right)
 ```
-The same update mechanism also applies to $G^{(1)}$ edge features $\mathbf{e}_{\alpha\beta}^{(1,l)}$. Therefore, these features evolve across layers and, via the $\mathbf{v}^{(2,l)}$-$\mathbf{e}^{(1,l)}$ identity, drive the updates on $G^{(2)}$.
-
 
+The same update mechanism also applies to $G^{(1)}$ edge features $\mathbf{e}_{\alpha\beta}^{(1,l)}$. Therefore, these features evolve across layers and, via the $\mathbf{v}^{(2,l)}$-$\mathbf{e}^{(1,l)}$ identity, drive the updates on $G^{(2)}$.
 
 ### Descriptor Construction
 

From cb42a70f0fec1f5249873208d04f0e857b783ea8 Mon Sep 17 00:00:00 2001
From: OpenClaw <njzjz2008@gmail.com>
Date: Tue, 24 Feb 2026 07:36:17 +0000
Subject: [PATCH 7/8] doc: address latest review nits on notation and symmetry
 section

Authored by OpenClaw (model: gpt-5.3-codex)
---
 doc/model/dpa3.md | 35 ++++++++++++++++++-----------------
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/doc/model/dpa3.md b/doc/model/dpa3.md
index 81c9647fab..d4693b6fab 100644
--- a/doc/model/dpa3.md
+++ b/doc/model/dpa3.md
@@ -34,31 +34,32 @@ Geometrically, vertices in $G^{(1)}$, $G^{(2)}$, $G^{(3)}$, and $G^{(4)}$ corres
 
 DPA3 performs message passing across all graphs in the LiGS. At layer $l$, the vertex and edge features on graph $G^{(k)}$ are denoted as $\mathbf{v}_\alpha^{(k,l)} \in \mathbb{R}^{d_v^{(k)}}$ and $\mathbf{e}_{\alpha\beta}^{(k,l)} \in \mathbb{R}^{d_e^{(k)}}$, where $\alpha$ and $\alpha\beta$ denote vertex and edge indices, and $d_v^{(k)}$, $d_e^{(k)}$ are per-graph feature dimensions (for example, in `RepFlowArgs`: $d_v^{(1)}=n_\text{dim}$, $d_e^{(1)}=e_\text{dim}$, $d_v^{(2)}=e_\text{dim}$, and $d_e^{(2)}=a_\text{dim}$).
 
-The feature update follows a recursive formulation with residual connections:
+The feature update follows a recursive formulation with residual connections. We use $\text{Update}_V$ and $\text{Update}_E$ to distinguish vertex and edge update modules, respectively:
 
-**For $G^{(1)}$ (initial graph):**
-The vertex features are updated through self-message and symmetrization:
+**Edge updates (all graphs $G^{(k)}$):**
+Edge features are updated based on messages from connected vertices:
 
 ```math
-\mathbf{v}_\alpha^{(1,l+1)} = \mathbf{v}_\alpha^{(1,l)} + \text{Update}^{(1)}\left(\mathbf{v}_\alpha^{(1,l)}, \{\mathbf{e}_{\alpha\beta}^{(1,l)}\}_{\beta \in \mathcal{N}(\alpha)}\right)
+\mathbf{e}_{\alpha\beta}^{(k,l+1)} = \mathbf{e}_{\alpha\beta}^{(k,l)} + \text{Update}_E^{(k)}\left(\mathbf{e}_{\alpha\beta}^{(k,l)}, \mathbf{v}_\alpha^{(k,l)}, \mathbf{v}_\beta^{(k,l)}\right)
 ```
 
-**For $G^{(k)}$ with $k > 1$:**
-The vertex feature of $G^{(k)}$ is identical to the edge feature of $G^{(k-1)}$:
+**For $G^{(1)}$ (initial graph, vertex update):**
+Vertex features are updated through self-message and symmetrization:
 
 ```math
-\mathbf{v}_\alpha^{(k,l)} = \mathbf{e}_{\alpha\beta}^{(k-1,l)}
+\mathbf{v}_\alpha^{(1,l+1)} = \mathbf{v}_\alpha^{(1,l)} + \text{Update}_V^{(1)}\left(\mathbf{v}_\alpha^{(1,l)}, \{\mathbf{e}_{\alpha\beta}^{(1,l)}\}_{\beta \in \mathcal{N}(\alpha)}\right)
 ```
 
-where $(\alpha,\beta)$ denotes the edge in $G^{(k-1)}$ corresponding to vertex $\alpha$ in $G^{(k)}$. This identity eliminates redundant storage.
-
-The edge features are updated based on messages from connected vertices:
+**For $G^{(k)}$ with $k > 1$ (vertex identity):**
+The vertex feature of $G^{(k)}$ is identical to the edge feature of $G^{(k-1)}$:
 
 ```math
-\mathbf{e}_{\alpha\beta}^{(k,l+1)} = \mathbf{e}_{\alpha\beta}^{(k,l)} + \text{Update}^{(k)}\left(\mathbf{e}_{\alpha\beta}^{(k,l)}, \mathbf{v}_\alpha^{(k,l)}, \mathbf{v}_\beta^{(k,l)}\right)
+\mathbf{v}_\alpha^{(k,l)} = \mathbf{e}_{\alpha\beta}^{(k-1,l)}
 ```
 
-The same update mechanism also applies to $G^{(1)}$ edge features $\mathbf{e}_{\alpha\beta}^{(1,l)}$. Therefore, these features evolve across layers and, via the $\mathbf{v}^{(2,l)}$-$\mathbf{e}^{(1,l)}$ identity, drive the updates on $G^{(2)}$.
+where $(\alpha,\beta)$ denotes the edge in $G^{(k-1)}$ corresponding to vertex $\alpha$ in $G^{(k)}$. This identity eliminates redundant storage.
+
+The same edge update rule also applies to $G^{(1)}$ edge features $\mathbf{e}_{\alpha\beta}^{(1,l)}$ (i.e., with $k=1$ in $\text{Update}_E^{(k)}$). Therefore, these features evolve across layers and, via the $\mathbf{v}^{(2,l)}$-$\mathbf{e}^{(1,l)}$ identity, drive the updates on $G^{(2)}$.
 
 ### Descriptor Construction
 
@@ -82,20 +83,20 @@ The total system energy is the sum of atomic contributions:
 E = \sum_\alpha E_\alpha
 ```
 
-### Physical Symmetries
+### Physical Symmetries and Conservative Forces
 
-DPA3 respects all physical symmetries of the potential energy surface:
+DPA3 respects the physical symmetries of the potential energy surface:
 
-1. **Translational invariance**: The model depends only on relative coordinates $\mathbf{r}_{ij} = \mathbf{r}_j - \mathbf{r}_i$, not absolute positions.
+1. **Translational invariance**: The model depends only on relative coordinates $\mathbf{r}_{\alpha\beta} = \mathbf{r}_\beta - \mathbf{r}_\alpha$, not absolute positions.
 
 1. **Rotational invariance**: The final descriptor is rotationally invariant; intermediate equivariant representations are used internally and contracted to produce invariant atomic features.
 
 1. **Permutational invariance**: Atoms of the same chemical species are treated identically under permutation symmetry operations (re-labeling) of identical atoms.
 
-1. **Energy conservation**: Forces are derived from energy gradients:
+In addition, DPA3 is inherently conservative: forces are derived from energy gradients:
 
 ```math
-\mathbf{F}_i = -\frac{\partial E}{\partial \mathbf{r}_i}
+\mathbf{F}_\alpha = -\frac{\partial E}{\partial \mathbf{r}_\alpha}
 ```
 
 Virials are similarly derived from cell tensor gradients, ensuring the model is conservative and suitable for molecular dynamics simulations.

From 77eae8fdb6ada9941327d767abb455d6c638f8eb Mon Sep 17 00:00:00 2001
From: OpenClaw <njzjz2008@gmail.com>
Date: Tue, 24 Feb 2026 07:42:28 +0000
Subject: [PATCH 8/8] doc: keep descriptor section focused; remove
 fitting/model equations

Authored by OpenClaw (model: gpt-5.3-codex)
---
 doc/model/dpa3.md | 12 +-----------
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/doc/model/dpa3.md b/doc/model/dpa3.md
index d4693b6fab..85d59033b5 100644
--- a/doc/model/dpa3.md
+++ b/doc/model/dpa3.md
@@ -71,17 +71,7 @@ The final vertex features of $G^{(1)}$ serve as the descriptor representing the
 
 where $L$ is the total number of layers.
 
-For multi-task training, the descriptor is augmented with dataset encoding (typically a one-hot vector) and passed through a fitting network to predict atomic energies ($\oplus$ denotes concatenation):
-
-```math
-E_\alpha = \mathcal{N}_{\text{fit}}(\mathcal{D}^\alpha \oplus \mathbf{d}_{\text{dataset}})
-```
-
-The total system energy is the sum of atomic contributions:
-
-```math
-E = \sum_\alpha E_\alpha
-```
+The descriptor output is then consumed by downstream fitting/model components for property prediction (e.g., energy). See the model/fitting documentation for those equations and training objectives.
 
 ### Physical Symmetries and Conservative Forces