Add Schur completment and its mat-free mode by zitongzhan · Pull Request #35 · pypose/bae

zitongzhan · 2026-05-24T03:07:51Z

This pull request introduces significant improvements to the optimizer infrastructure, focusing on enhanced memory profiling, a new Schur complement optimizer, and better support for matrix-free operations.

Optimizer Enhancements

Added a new Schur optimizer class in bae.optim.optimizer, implementing the Schur complement method with support for both standard and matrix-free normal equations, block Jacobi preconditioning, and efficient memory usage.
Updated the LM optimizer to support a matrix_free_normal mode, allowing for more efficient computation and memory usage in large-scale problems.
Add a custom TrustRegion class that supports Warp, especially for use with the Schur optimizer.

Sparse Matrix and PyOps Improvements

Improved sparse matrix operations, including fixes to inv_op for correct tensor creation and a new test block in py_ops.py for diagonal operations on CUDA.

Added a section for future plans including a new backend for distributed solver.

… schur-matmul

…l path runs

gemini-code-assist

Code Review

This pull request introduces high-performance Triton kernels for sparse BSR operations, including matrix-vector multiplication, matrix-matrix multiplication, and transposition. It also implements a matrix-free NormalMatVec operator and a new Schur complement-based optimizer to improve the efficiency of bundle adjustment tasks. The bundle adjustment example was updated with CUDA memory snapshotting and Warp mempool reporting. Review feedback highlights a critical issue where in-place diagonal modifications in the LM and Schur optimizers cause damping factors to accumulate incorrectly during step rejections. Additionally, the reviewer recommends removing performance-hindering torch.cuda.empty_cache() calls, addressing potential divisions by zero in the Conjugate Gradient solver, and cleaning up redundant or commented-out code.

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

…ary import line

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

zitongzhan · 2026-05-28T01:15:31Z

Profile Summary
Profiled current ba_example.py on Venice problem-1778-993923-pre: 5,001,946 observations, 1,778 cameras, 993,923 points. I passed matrix_free_normal=True and False; current ba_example.py defaults to disabled.

Mode	Steady wall time	Main slow operators
`matrix_free_normal=True`	1.53 s	Warp BSR MV kernels inside `linear.cg`: ~1.19 s CUDA, ~84%
`matrix_free_normal=False`	1.83 s	Split between explicit Schur `warp_bsr_mm`: ~0.66 s, and CG BSR MV: ~0.68 s

Enabled
With matrix-free enabled, the bottleneck is still BSR matvec, now inside Warp CG. The hottest kernels were:

Kernel / scope	CUDA time
`bsr_mv_transpose_kernel...`	801 ms
`bsr_mv_kernel_acf84b96...`	230 ms
`bsr_mv_kernel_0d4f3dc9...`	163 ms
`jacobian`	123 ms

This corresponds to the repeated matrix-free Schur matvec in optimizer.py, especially the sparse.bsr_mv chain at lines 145-152 and the CG calls at lines 180 and 200.

Disabled
With matrix-free disabled, the cost shifts: explicit Schur construction becomes about as expensive as CG matvecs.

Kernel / scope	CUDA time
`warp_bsr_mm` scope	658 ms
`_bsr_mm_compute_values...`	611 ms
`bsr_mv_tiled_kernel...`	645 ms
`jacobian`	122 ms

That maps to explicit Schur construction at optimizer.py: WV_i = sparse.bsr_mm(W, V_i) and WVi_Wt = sparse.bsr_mm(WV_i, Wt).

SEOKWOOPARK · 2026-05-29T21:47:32Z

Runtime with 20 iterations based on Schur

=> trafalgar (problem-257-65132-pre) without Matrix-Free + 257 images + 65132 points + 225911 observations

=> trafalgar (problem-257-65132-pre) with Matrix-Free + 257 images + 65132 points + 225911 observations

=> ladybug (problem-1723-156502-pre) without Matrix-Free + 1723 images + 156502 points + 678718 observations

=> ladybug (problem-1723-156502-pre) with Matrix-Free + 1723 images + 156502 points + 678718 observations

=> dubrovnik (problem-356-226730-pre) without Matrix-Free + 356 images + 226730 points + 1255268 observations

=> dubrovnik (problem-356-226730-pre) with Matrix-Free + 356 images + 226730 points + 1255268 observations

=> venice (problem-1778-993923-pre) without Matrix-Free + 1778 images + 993923 points + 5001946 observations

=> venice (problem-1778-993923-pre) with Matrix-Free + 1778 images + 993923 points + 5001946 observations

=> final (problem-13682-4456117-pre) with Matrix-Free + 13682 images + 4456117 points + 28987644 observations

…date function in TrustRegion to make it run in LM

…tory

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

zitongzhan and others added 25 commits December 15, 2025 03:12

add normal matvec and memory profiler

9490ff8

print peak cuda allocation

9c90aca

add warp memory pool report

6256e79

use A._get_Jt when matrix_free_normal

3a5ce9b

add back schur by warp's matmul

0064146

safely import cudss

acd1b3c

Add future plans section to README

91c8ade

Added a section for future plans including a new backend for distributed solver.

add normal matvec and memory profiler

19774c3

print peak cuda allocation

4ca9c86

add warp memory pool report

b71f1a3

use A._get_Jt when matrix_free_normal

d678867

add back schur by warp's matmul

d127b88

Merge branch 'schur-matmul' of github.com:zitongzhan/bae_private into…

fa9ab70

… schur-matmul

Merge remote-tracking branch 'upstream/release' into schur-matmul

6619808

Preventing TrustRegion from accepting diverging steps

3e4761d

fix(optimizer/LM): Remove redundant solver calls so matrix_free_norma…

5d9e2b2

…l path runs

feat(optim/Schur): Add Matrix-Free path and matrix_free_normal branch

e34bea2

Resolving conflict with release branch in README

5f4f093

Version up to 0.2.1

f64d00b

Fix deprecated function in Warp

40798f1

Replace Warp with Triton kernels and adjust corresponding codes

165104d

Remove codes relevant to Chunk

b305f81

Merge branch 'release' into memory-issue-swp

3a97f9e

Remove ba_helpers.py

a0b4b8b

Fix a conflict in ba_example.py

f46fb74

github-code-quality Bot found potential problems May 24, 2026

View reviewed changes

Comment thread bae/sparse/warp_wrappers.py Fixed

Comment thread bae/optim/optimizer.py Fixed

Comment thread bae/sparse/py_ops.py Fixed

gemini-code-assist Bot reviewed May 24, 2026

View reviewed changes

zitongzhan and others added 3 commits May 23, 2026 20:35

Potential fix for pull request finding 'Variable defined multiple times'

48ad787

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

Potential fix for pull request finding 'Unused local variable'

8cc6eb3

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

minimize diff

074b931

github-code-quality Bot found potential problems May 27, 2026

View reviewed changes

Comment thread ba_example.py Fixed

Comment thread ba_example.py Fixed

Comment thread ba_example.py Fixed

Comment thread ba_example.py Fixed

Comment thread ba_example.py Fixed

Add 'final' and 'venice' dataset in ba_example.py and remove unnecess…

733c0a6

…ary import line

github-code-quality Bot found potential problems May 27, 2026

View reviewed changes

Comment thread ba_example.py Fixed

Comment thread ba_example.py Fixed

Comment thread ba_example.py Fixed

Comment thread ba_example.py Fixed

Potential fix for pull request finding 'Unused import'

c94f27a

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

zitongzhan commented May 28, 2026

View reviewed changes

Comment thread ba_example.py Outdated

SEOKWOOPARK added 4 commits May 29, 2026 21:57

Free PyTorch cache before Warp handoff in Schur.step

aaa7e5b

Rollback unnecessary changes in gitignore

dff7575

Rollback single quote to double quote in ba_example.py

2bf5e62

Rollback the optimizer's default class from Schur to LM and adjust up…

50075ec

…date function in TrustRegion to make it run in LM

github-code-quality Bot found potential problems May 30, 2026

View reviewed changes

Comment thread ba_example.py Fixed

SEOKWOOPARK added 7 commits May 30, 2026 23:43

Remove class Reproj and use Residual

0ced1df

Rollback single to double quote in Time print

ba6dc25

Remove overlapped torch.synchronize

13dc2a2

Single quote -> double quote in dataset's key

ec3262e

Rollback dataset's declaration in main

f5c1751

Remove rotate_quat

90b14f8

Rollback least_square_error's parameter name

6bb75ee

github-code-quality Bot found potential problems May 31, 2026

View reviewed changes

Comment thread ba_example.py Fixed

SEOKWOOPARK added 4 commits May 31, 2026 00:25

Remove unused variable 'USE_QUATERNIONS'

a55f607

Retrieve print for Initial loss

d7f9ebc

Retrieve the decorator

077dce4

Remove overlapped import

fb41830

github-code-quality Bot found potential problems May 31, 2026

View reviewed changes

Comment thread ba_example.py Dismissed

SEOKWOOPARK added 3 commits May 31, 2026 03:04

Remove empty_cache()

60a9530

Rollback variable names in class LM

7adb46e

Transfer TrustRegion and Adaptive into strategy.py in the optim direc…

49a18f1

…tory

github-code-quality Bot found potential problems Jun 1, 2026

View reviewed changes

Comment thread ba_example.py Fixed

Potential fix for pull request finding 'Unused import'

8196c03

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

Conversation

zitongzhan commented May 24, 2026

Optimizer Enhancements

Sparse Matrix and PyOps Improvements

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zitongzhan commented May 28, 2026

Uh oh!

SEOKWOOPARK commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SEOKWOOPARK commented May 29, 2026 •

edited

Loading