deepmodeling · Silver-Moon-Over-Snow · Mar 28, 2026 · Mar 28, 2026 · Apr 7, 2026 · Apr 8, 2026
diff --git a/docs/advanced/input_files/input-main.md b/docs/advanced/input_files/input-main.md
@@ -980,7 +980,7 @@
 ### pw_diag_thr
 
 - **Type**: Real
-- **Description**: Only used when you use ks_solver = cg/dav/dav_subspace/bpcg. It indicates the threshold for the first electronic iteration, from the second iteration the pw_diag_thr will be updated automatically. For nscf calculations with planewave basis set, pw_diag_thr should be &lt;= 1e-3.
+- **Description**: Only used when you use ks_solver = cg/dav/dav_subspace/bpcg/ppcg. It indicates the threshold for the first electronic iteration, from the second iteration the pw_diag_thr will be updated automatically. For nscf calculations with planewave basis set, pw_diag_thr should be &lt;= 1e-3.
 - **Default**: 0.01
 
 ### diago_smooth_ethr
@@ -999,14 +999,15 @@
 ### pw_diag_nmax
 
 - **Type**: Integer
-- **Availability**: *basis_type==pw, ks_solver==cg/dav/dav_subspace/bpcg*
-- **Description**: Only useful when you use ks_solver = cg/dav/dav_subspace/bpcg. It indicates the maximal iteration number for cg/david/dav_subspace/bpcg method.
+- **Availability**: *basis_type==pw, ks_solver==cg/dav/dav_subspace/bpcg/ppcg*
+- **Description**: Only useful when you use ks_solver = cg/dav/dav_subspace/bpcg/ppcg. It indicates the maximal iteration number for cg/david/dav_subspace/bpcg/ppcg method.
 - **Default**: 50
 
 ### pw_diag_ndim
 
 - **Type**: Integer
-- **Description**: Only useful when you use ks_solver = dav or ks_solver = dav_subspace. It indicates dimension of workspace(number of wavefunction packets, at least 2 needed) for the Davidson method. A larger value may yield a smaller number of iterations in the algorithm but uses more memory and more CPU time in subspace diagonalization.
+- **Availability**: *basis_type==pw, ks_solver==dav/dav_subspace/ppcg*
+- **Description**: Only useful when you use ks_solver = dav, dav_subspace, or ppcg. It indicates dimension of workspace(number of wavefunction packets, at least 2 needed) for the Davidson method, and the PPCG block size/Rayleigh-Ritz interval for the PPCG method. A larger value may yield a smaller number of iterations in the algorithm but uses more memory and more CPU time in subspace diagonalization.
 - **Default**: 4
 
 ### diago_cg_prec
@@ -1115,6 +1116,7 @@
   - bpcg: The BPCG method, which is a block-parallel Conjugate Gradient (CG) method, typically exhibits higher acceleration in a GPU environment.
   - dav: The Davidson algorithm.
   - dav_subspace: The Davidson algorithm without orthogonalization operation, this method is the most recommended for efficiency. pw_diag_ndim can be set to 2 for this method.
+  - ppcg: The projection preconditioned conjugate-gradient method, currently available for CPU plane-wave calculations.
 
   For numerical atomic orbitals basis,
 

diff --git a/docs/advanced/scf/hsolver.md b/docs/advanced/scf/hsolver.md
@@ -4,7 +4,7 @@
 
 Method of explicit solving KS-equation can be chosen by variable "ks_solver" in INPUT file.
 
-When "basis_type = pw", `ks_solver` can be `cg`, `bpcg` or `dav`. The default setting `cg` is recommended, which is band-by-band conjugate gradient diagonalization method. There is a large probability that the use of setting of `dav` , which is block Davidson diagonalization method, can be tried to improve performance.  
+When "basis_type = pw", `ks_solver` can be `cg`, `bpcg`, `dav`, `dav_subspace`, or `ppcg`. The default setting `cg` is recommended, which is a band-by-band conjugate-gradient diagonalization method. The `dav` and `dav_subspace` settings use Davidson-style subspace diagonalization and can be tried to improve performance. The `ppcg` setting uses the projection preconditioned conjugate-gradient method and is currently available for CPU plane-wave calculations.
 
 When "basis_type = lcao", `ks_solver` can be `genelpa` or `scalapack_gvx`. The default setting `genelpa` is recommended, which is based on ELPA (EIGENVALUE SOLVERS FOR PETAFLOP APPLICATIONS) (https://elpa.mpcdf.mpg.de/) and the kernel is auto choosed by GENELPA(https://github.com/pplab/GenELPA), usually faster than the setting of "scalapack_gvx", which is based on ScaLAPACK(Scalable Linear Algebra PACKage)  
 

diff --git a/docs/parameters.yaml b/docs/parameters.yaml
@@ -521,6 +521,7 @@ parameters:
       * bpcg: The BPCG method, which is a block-parallel Conjugate Gradient (CG) method, typically exhibits higher acceleration in a GPU environment.
       * dav: The Davidson algorithm.
       * dav_subspace: The Davidson algorithm without orthogonalization operation, this method is the most recommended for efficiency. `pw_diag_ndim` can be set to 2 for this method.
+      * ppcg: The projection preconditioned conjugate-gradient method, currently available for CPU plane-wave calculations.
 
       For numerical atomic orbitals basis,
 
@@ -942,7 +943,7 @@ parameters:
     category: Plane wave related variables
     type: Real
     description: |
-      Only used when you use ks_solver = cg/dav/dav_subspace/bpcg. It indicates the threshold for the first electronic iteration, from the second iteration the pw_diag_thr will be updated automatically. For nscf calculations with planewave basis set, pw_diag_thr should be <= 1e-3.
+      Only used when you use ks_solver = cg/dav/dav_subspace/bpcg/ppcg. It indicates the threshold for the first electronic iteration, from the second iteration the pw_diag_thr will be updated automatically. For nscf calculations with planewave basis set, pw_diag_thr should be <= 1e-3.
     default_value: "0.01"
     unit: ""
     availability: ""
@@ -966,18 +967,18 @@ parameters:
     category: Plane wave related variables
     type: Integer
     description: |
-      Only useful when you use ks_solver = cg/dav/dav_subspace/bpcg. It indicates the maximal iteration number for cg/david/dav_subspace/bpcg method.
+      Only useful when you use ks_solver = cg/dav/dav_subspace/bpcg/ppcg. It indicates the maximal iteration number for cg/david/dav_subspace/bpcg/ppcg method.
     default_value: "50"
     unit: ""
-    availability: "basis_type==pw, ks_solver==cg/dav/dav_subspace/bpcg"
+    availability: "basis_type==pw, ks_solver==cg/dav/dav_subspace/bpcg/ppcg"
   - name: pw_diag_ndim
     category: Plane wave related variables
     type: Integer
     description: |
-      Only useful when you use ks_solver = dav or ks_solver = dav_subspace. It indicates dimension of workspace(number of wavefunction packets, at least 2 needed) for the Davidson method. A larger value may yield a smaller number of iterations in the algorithm but uses more memory and more CPU time in subspace diagonalization.
+      Only useful when you use ks_solver = dav, dav_subspace, or ppcg. It indicates dimension of workspace(number of wavefunction packets, at least 2 needed) for the Davidson method, and the PPCG block size/Rayleigh-Ritz interval for the PPCG method. A larger value may yield a smaller number of iterations in the algorithm but uses more memory and more CPU time in subspace diagonalization.
     default_value: "4"
     unit: ""
-    availability: ""
+    availability: "basis_type==pw, ks_solver==dav/dav_subspace/ppcg"
   - name: diago_cg_prec
     category: Plane wave related variables
     type: Integer

diff --git a/source/source_hsolver/CMakeLists.txt b/source/source_hsolver/CMakeLists.txt
@@ -4,6 +4,7 @@ list(APPEND objects
     diago_david.cpp
     diago_dav_subspace.cpp
     diago_bpcg.cpp
+    diago_ppcg.cpp
     para_linear_transform.cpp
     hsolver_pw.cpp
     hsolver_lcaopw.cpp

diff --git a/source/source_hsolver/diago_iter_assist.h b/source/source_hsolver/diago_iter_assist.h
@@ -20,6 +20,7 @@ class DiagoIterAssist
   public:
     static Real PW_DIAG_THR;
     static int PW_DIAG_NMAX;
+    static int PW_DIAG_NDIM;
 
     static Real LCAO_DIAG_THR;
     static int LCAO_DIAG_NMAX;
@@ -153,6 +154,9 @@ typename DiagoIterAssist<T, Device>::Real DiagoIterAssist<T, Device>::avg_iter =
 template <typename T, typename Device>
 int DiagoIterAssist<T, Device>::PW_DIAG_NMAX = 30;
 
+template <typename T, typename Device>
+int DiagoIterAssist<T, Device>::PW_DIAG_NDIM = 4;
+
 template <typename T, typename Device>
 typename DiagoIterAssist<T, Device>::Real DiagoIterAssist<T, Device>::PW_DIAG_THR = 1.0e-2;
 
@@ -175,4 +179,4 @@ template <typename T, typename Device>
 T DiagoIterAssist<T, Device>::zero = static_cast<T>(0.0);
 } // namespace hsolver
 
-#endif
+#endif
diff --git a/source/source_hsolver/diago_params.cpp b/source/source_hsolver/diago_params.cpp
@@ -15,6 +15,7 @@ void setup_diago_params_pw(const int istep,
     DiagoIterAssist<T, Device>::need_subspace = ((istep == 0 || istep == 1) && iter == 1) ? false : true;
     DiagoIterAssist<T, Device>::SCF_ITER = iter;
     DiagoIterAssist<T, Device>::PW_DIAG_THR = ethr;
+    DiagoIterAssist<T, Device>::PW_DIAG_NDIM = inp.pw_diag_ndim;
 
     if (inp.calculation != "nscf")
     {
@@ -41,6 +42,7 @@ void setup_diago_params_sdft(const int istep,
 
     DiagoIterAssist<T, Device>::PW_DIAG_THR = ethr;
     DiagoIterAssist<T, Device>::PW_DIAG_NMAX = inp.pw_diag_nmax;
+    DiagoIterAssist<T, Device>::PW_DIAG_NDIM = inp.pw_diag_ndim;
 }
 
 /// Template instantiation for CPU

diff --git a/source/source_hsolver/diago_ppcg.cpp b/source/source_hsolver/diago_ppcg.cpp
@@ -0,0 +1,19 @@
+#include "diago_ppcg.h"
+
+#include "ppcg/diago_ppcg_reduce.hpp"
+#include "ppcg/diago_ppcg_small_eigen.hpp"
+#include "ppcg/diago_ppcg_ops.hpp"
+#include "ppcg/diago_ppcg_subspace.hpp"
+#include "ppcg/diago_ppcg_orth.hpp"
+#include "ppcg/diago_ppcg_cg.hpp"
+#include "ppcg/diago_ppcg_diag.hpp"
+
+namespace hsolver {
+
+// =============================================================================
+// Explicit template instantiation (CPU only; extend for GPU as needed)
+// =============================================================================
+template class DiagoPPCG<std::complex<float>,  base_device::DEVICE_CPU>;
+template class DiagoPPCG<std::complex<double>, base_device::DEVICE_CPU>;
+
+} // namespace hsolver
diff --git a/source/source_hsolver/diago_ppcg.h b/source/source_hsolver/diago_ppcg.h
@@ -0,0 +1,219 @@
+#ifndef DIAGO_PPCG_H
+#define DIAGO_PPCG_H
+
+#include "source_base/module_device/types.h"
+
+#include <vector>
+#include <functional>
+#include <cmath>
+#include <algorithm>
+#include <limits>
+#include <stdexcept>
+#include <numeric>
+#include <complex>
+#include <type_traits>
+
+namespace hsolver {
+
+// -----------------------------------------------------------------------------
+// DiagoPPCG: Projection Preconditioned Conjugate Gradient solver
+// -----------------------------------------------------------------------------
+//
+// Supports two algorithmic strategies:
+//   CONJUGATE_GRADIENT — band-by-band Polak-Ribiere CG with line minimization
+//     (File 2 approach).
+//   BLOCK_SUBSPACE — block subspace diagonalization (File 1 approach).
+//
+// BLOCK_SUBSPACE is the production path used by ks_solver=ppcg.
+// CONJUGATE_GRADIENT is kept as an explicit fallback strategy.
+// -----------------------------------------------------------------------------
+
+enum class PpcgStrategy { BLOCK_SUBSPACE, CONJUGATE_GRADIENT };
+
+namespace base_device = ::base_device;
+
+template <typename T, typename Device>
+class DiagoPPCG
+{
+public:
+    // -------------------------------------------------------------------------
+    // Type aliases
+    // -------------------------------------------------------------------------
+    using Real = typename std::conditional<
+        std::is_same<T, std::complex<double>>::value, double,
+        float>::type;
+    using HPsiFunc = std::function<void(T*, T*, int, int)>;
+    using SPsiFunc = std::function<void(T*, T*, int, int)>;
+
+    // -------------------------------------------------------------------------
+    // Constructor
+    // -------------------------------------------------------------------------
+    DiagoPPCG(const Real& diag_thr,
+              const int& diag_iter_max,
+              const int& sbsize,
+              const int& rr_step,
+              const bool gamma_g0_real,
+              const PpcgStrategy strategy = PpcgStrategy::BLOCK_SUBSPACE);
+
+    // -------------------------------------------------------------------------
+    // Main entry point
+    //
+    // Returns average number of subspace iterations per band.
+    // -------------------------------------------------------------------------
+    double diag(const HPsiFunc& hpsi_func,
+                const SPsiFunc& spsi_func,
+                int ld_psi,
+                int nband,
+                int dim,
+                T* psi_in,
+                Real* eigenvalue_in,
+                const std::vector<double>& ethr_band,
+                const Real* prec);
+
+private:
+    // -------------------------------------------------------------------------
+    // Data members
+    // -------------------------------------------------------------------------
+    int maxiter_;
+    int sbsize_;
+    int rr_step_;
+    Real diag_thr_;
+    bool gamma_g0_real_;
+    PpcgStrategy strategy_;
+
+    // Problem dimensions (set in diag())
+    int ld_psi_ = 0;
+    int n_band_ = 0;
+    int n_dim_ = 0;
+
+    // Cached S-operator (null if identity).
+    SPsiFunc spsi_func_;
+
+    // Working storage (column-major: ld_psi_ rows, n_band_ columns).
+    std::vector<T> hpsi_;
+    std::vector<T> spsi_;
+    std::vector<T> w_;       // residual / preconditioned residual
+    std::vector<T> sw_;      // S * w
+    std::vector<T> hw_;      // H * w
+    std::vector<T> rr_psi_;  // Rayleigh-Ritz rotation workspace
+    std::vector<T> rr_spsi_;
+    std::vector<T> rr_hpsi_;
+    std::vector<T> rr_hsub_;
+    std::vector<T> rr_ssub_;
+    std::vector<Real> rr_eval_;
+
+    // Polak-Ribiere state (CONJUGATE_GRADIENT strategy)
+    std::vector<T> z_old_;      // previous preconditioned residual
+    std::vector<Real> beta_denom_;
+
+    // -------------------------------------------------------------------------
+    // Internal helpers
+    // -------------------------------------------------------------------------
+    static inline int idx(int row, int col, int ld)
+    {
+        return row + col * ld;
+    }
+
+    void validate_input(const HPsiFunc& hpsi_func,
+                        const T* psi_in, const Real* eigenvalue_in,
+                        const std::vector<double>& ethr_band,
+                        const Real* prec) const;
+
+    void force_g0_real(T* x, int ncol) const;
+
+    // S-application (identity fallback if spsi_func is null).
+    void apply_h(const HPsiFunc& hpsi_func, T* psi_in, T* hpsi_out,
+                 int ncol) const;
+    void apply_s(const SPsiFunc& spsi_func, T* psi_in, T* spsi_out,
+                 int ncol) const;
+    void apply_s_current(T* psi_in, T* spsi_out, int ncol) const;
+
+    // Inner product <x|y> (real part only).
+    Real gamma_dot(const T* x, const T* y) const;
+    T complex_dot(const T* x, const T* y) const;
+
+    // Gram matrix: out[i, j] = <a_i | b_j>.
+    void gram(const T* a, const T* b,
+              int ncol_a, int ncol_b,
+              std::vector<T>& out, int ld_out) const;
+
+    // Gather / scatter columns.
+    void copy_cols(const T* src, const std::vector<int>& cols,
+                   std::vector<T>& dst) const;
+    void scatter_cols(T* dst, const std::vector<int>& cols,
+                      const std::vector<T>& src) const;
+
+    // Project x onto vectors orthogonal to the S-orthonormal basis.
+    void project_against(const T* basis, const T* sbasis,
+                         const std::vector<int>& basis_cols,
+                         std::vector<T>& x, std::vector<T>& sx,
+                         const std::vector<int>& x_cols) const;
+
+    // x[c] /= max(prec, eps)  for each active column c.
+    void divide_by_preconditioner(const std::vector<int>& active_cols,
+                                  const Real* prec,
+                                  std::vector<T>& x) const;
+
+    // -------------------------------------------------------------------------
+    // Block-subspace strategy helpers (File 1 style)
+    // -------------------------------------------------------------------------
+    struct SmallSubspace
+    {
+        std::vector<T> k;        // K matrix (projected H)
+        std::vector<T> m;        // M matrix (projected S)
+        std::vector<Real> eval;  // eigenvalues
+        std::vector<Real> w_scale;
+    };
+
+    void lock_epairs(const std::vector<T>& residual,
+                     const std::vector<double>& ethr_band,
+                     std::vector<int>& active_cols) const;
+
+    void build_small_subspace(const T* psi,
+                              const std::vector<int>& cols,
+                              SmallSubspace& subspace) const;
+
+    void solve_small_generalized(int dim, SmallSubspace& subspace) const;
+
+    void update_one_block(T* psi,
+                          const std::vector<int>& cols,
+                          int l,
+                          const SmallSubspace& subspace);
+
+    bool is_s_orthonormal(const T* psi, const T* spsi, int ncol) const;
+
+    void s_gram_schmidt(T* psi, T* hpsi, T* spsi, int ncol) const;
+
+    void rayleigh_ritz(T* psi, Real* eigenvalue,
+                       std::vector<int>& active_cols,
+                       const std::vector<double>& ethr_band);
+
+    // -------------------------------------------------------------------------
+    // Conjugate-gradient strategy helpers (File 2 style)
+    // -------------------------------------------------------------------------
+    void calc_gradient(const Real* prec,
+                       const T* hpsi,
+                       const T* spsi,
+                       const T* psi,
+                       const Real* eigenvalue,
+                       std::vector<T>& grad) const;
+
+    void orth_gradient(const T* psi, const T* spsi,
+                       std::vector<T>& grad) const;
+
+    void update_polak_ribiere(const std::vector<T>& grad,
+                              std::vector<T>& p,
+                              std::vector<T>& z_old,
+                              std::vector<Real>& beta_denom,
+                              const Real* prec) const;
+
+    void line_minimize(T* psi, T* hpsi, T* spsi,
+                       const T* p, const T* hp, const T* sp,
+                       int ncol) const;
+
+    void orth_cholesky(T* psi, T* hpsi, T* spsi, int ncol) const;
+};
+
+} // namespace hsolver
+
+#endif // DIAGO_PPCG_H