Add Schur completment and its mat-free mode#35
Conversation
Added a section for future plans including a new backend for distributed solver.
There was a problem hiding this comment.
Code Review
This pull request introduces high-performance Triton kernels for sparse BSR operations, including matrix-vector multiplication, matrix-matrix multiplication, and transposition. It also implements a matrix-free NormalMatVec operator and a new Schur complement-based optimizer to improve the efficiency of bundle adjustment tasks. The bundle adjustment example was updated with CUDA memory snapshotting and Warp mempool reporting. Review feedback highlights a critical issue where in-place diagonal modifications in the LM and Schur optimizers cause damping factors to accumulate incorrectly during step rejections. Additionally, the reviewer recommends removing performance-hindering torch.cuda.empty_cache() calls, addressing potential divisions by zero in the Conjugate Gradient solver, and cleaning up redundant or commented-out code.
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
|
Profile Summary
Enabled
This corresponds to the repeated matrix-free Schur matvec in optimizer.py, especially the Disabled
That maps to explicit Schur construction at optimizer.py: |
…date function in TrustRegion to make it run in LM
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>









This pull request introduces significant improvements to the optimizer infrastructure, focusing on enhanced memory profiling, a new Schur complement optimizer, and better support for matrix-free operations.
Optimizer Enhancements
Added a new
Schuroptimizer class inbae.optim.optimizer, implementing the Schur complement method with support for both standard and matrix-free normal equations, block Jacobi preconditioning, and efficient memory usage.Updated the
LMoptimizer to support amatrix_free_normalmode, allowing for more efficient computation and memory usage in large-scale problems.Add a custom
TrustRegionclass that supports Warp, especially for use with the Schur optimizer.Sparse Matrix and PyOps Improvements
inv_opfor correct tensor creation and a new test block inpy_ops.pyfor diagonal operations on CUDA.