Add KLUWrapper by jd-lara · Pull Request #296 · Sienna-Platform/PowerNetworkMatrices.jl

jd-lara · 2026-04-28T00:15:59Z

This PR is important to reduce issues with KLU allowing for multi entry

Adds a small allocation-aware wrapper over libklu (provided via SuiteSparse_jll) tailored to the access patterns of this package: - KLULinSolveCache splits symbolic and numeric factorization, supports numeric_refactor! that reuses the analysis and the prior numeric struct, and is non-allocating after construction. - solve!/tsolve! call libklu's native dense multi-RHS routines directly. - solve_sparse!/solve_sparse provide a sparse-RHS path: B's columns are scattered into a small dense scratch in chunks instead of densifying the full N x M RHS up front. Has a skip_empty option for RHSs where most columns are structurally empty (Ward reduction, Woodbury kernel). - Real (Float64) and complex (ComplexF64) paths share the same struct via type-dispatched ccall helpers (klu_l_* and klu_zl_*). This replaces the use of KLU.jl elsewhere in the package. KLU.jl is removed from Project.toml; SuiteSparse_jll is added in its place. https://claude.ai/code/session_0128vLG5HzxrMTZYjE9oTGfd

Replaces KLU.KLUFactorization with the new KLULinSolveCache across the matrix types and uses solve_sparse! for the multi-RHS, structurally sparse code paths: - ABA_Matrix.K is now KLULinSolveCache{Float64}; factorize/is_factorized updated accordingly. DC_ABA_Matrix_Factorized type alias updated. - _calculate_PTDF_matrix_KLU uses solve_sparse! on BA[valid_ix, :], scattering the structurally-sparse RHS columns instead of densifying the full (buscount-nref) x linecount matrix. - _calculate_LODF_matrix_KLU passes transpose(a)[valid_ix, :] directly to solve_sparse!, removing the original zeros(buscount, buscount) intermediate. - VirtualPTDF.K, VirtualLODF.K, VirtualMODF.K all hold a KLULinSolveCache{Float64}; per-row solve! calls remain non-allocating. - DC_vPTDF_Matrix is left unconstrained on K so AppleAccelerate factorizations still satisfy the alias. https://claude.ai/code/session_0128vLG5HzxrMTZYjE9oTGfd

Ward reduction: - The boundary-bus identity solve becomes a single solve_sparse! call on a sparse identity matrix instead of a per-bus loop with one RHS each. - The y_eq computation switches to solve_sparse(...; skip_empty=true) on y_eb. Most external buses are not adjacent to any boundary bus, so the RHS columns are largely empty and the skip-empty short-circuit avoids redundant solves. Tests: - Real and complex round-trip and refactor coverage. - solve_sparse! agrees with the dense-RHS path. - skip_empty produces zero columns for empty RHS columns. - solve_sparse! into a view writes only the targeted rows. https://claude.ai/code/session_0128vLG5HzxrMTZYjE9oTGfd

Simplifications: - Drop redundant n field from KLULinSolveCache; derive from colptr. - Drop _decrement!/_increment! helpers in favour of broadcast `.+= 1` / `.-= 1`. - Rename finalize! -> Base.finalize so it composes with stdlib instead of shadowing Base.finalize through the module's exports. - Drop solve_w_refinement (no in-tree caller); add Base.\\ on the cache for symmetry with KLU.KLUFactorization. - Unify the two tsolve! methods via a _tsolve_call helper that ignores the conjugate flag on the real path. - Drop block= and skip_empty= keywords from solve_sparse!. Always pack non-empty columns into a single dense scratch and dispatch one libklu multi-RHS solve. Empty RHS columns are zeroed in the output without a solve, in all cases. - Throw stdlib exception types (SingularException, OutOfMemoryError, ArgumentError, OverflowError) from klu_throw to match the conventions used in SparseArrays.CHOLMOD and elsewhere in PNM. - Refactor _calculate_PTDF_matrix_KLU: collapse the duplicated dist_slack branches into a single solve plus a slack-distribution post-step. - Strip migration-narrative comments from the consumer call sites. New: KLULinSolvePool - A small pool of independent KLULinSolveCache workers, each holding its own factorization of the same matrix. KLU's numeric struct and Common status field are mutated by klu_solve, so safe parallel use needs one factor copy per worker. - API: KLULinSolvePool(A; nworkers), with_worker(f, pool) -> result, acquire!/release!, numeric_refactor!(pool, A), Base.finalize. - Plumbing into VirtualMODF (per-worker scratch + cache locks) is staged for the next commit. https://claude.ai/code/session_0128vLG5HzxrMTZYjE9oTGfd

@Spawn

VirtualMODF.K is now a KLULinSolvePool{Float64} sized at construction time to nworkers (defaults to Threads.nthreads()). Per-worker scratch buffers (work_ba_col, temp_data) are stored as Vector{Vector{Float64}} indexed by the worker handle returned from with_worker. Woodbury kernel refactor: - Split _compute_woodbury_factors / _apply_woodbury_correction into pure-data _impl functions that take cache + scratch + data arrays explicitly, plus mat-typed outer wrappers. - VirtualPTDF wrapper uses its single shared scratch (existing single-cache behavior). - VirtualMODF wrapper uses with_worker to acquire a per-worker cache and scratch. Cache locking: - Add ReentrantLocks for woodbury_cache and row_caches. - _get_woodbury_factors and _get_or_create_row_cache use double-checked locking so concurrent miss-then-fill is correct. - clear_caches! / clear_all_caches! / Base.getindex acquire the locks. Tests: - New test/test_virtual_threaded.jl: uses Threads.@Spawn for VirtualPTDF, VirtualLODF, and VirtualMODF. The VirtualMODF case spawns one task per (monitored, contingency) work item and validates parallel results match a serial reference. VirtualPTDF/VirtualLODF use a single-task @Spawn since their K and scratch are still shared (not yet pool-backed); the comments call this out explicitly. - Tests in test/test_klu_wrapper.jl exercise KLULinSolvePool: basic with_worker, concurrent solves via @threads on a 4-worker pool, and numeric_refactor!. https://claude.ai/code/session_0128vLG5HzxrMTZYjE9oTGfd

github-actions · 2026-04-28T00:25:08Z

Performance Results

Precompile Time

Main	This Branch	Delta
2.196 s	2.196 s	-0.0%

Execution Time

Test	Main	This Branch	Delta
matpower_ACTIVSg2000_sys-Build PTDF First	2.176 s	1.85 s	-15.0%
matpower_ACTIVSg2000_sys-Build PTDF Second	116.7 ms	499.3 ms	+327.7%
matpower_ACTIVSg2000_sys-Build Ybus First	15.3 ms	13.6 ms	-11.2%
matpower_ACTIVSg2000_sys-Build Ybus Second	14.3 ms	12.3 ms	-13.8%
matpower_ACTIVSg2000_sys-Build LODF First	168.9 ms	165.8 ms	-1.8%
matpower_ACTIVSg2000_sys-Build LODF Second	236.0 ms	163.2 ms	-30.8%
matpower_ACTIVSg2000_sys-Build VirtualMODF First	4.102 s	4.944 s	+20.5%
matpower_ACTIVSg2000_sys-Build VirtualMODF Second	203.6 ms	672.7 ms	+230.4%
matpower_ACTIVSg2000_sys-VirtualMODF Query 10 rows	477.1 ms	505.4 ms	+5.9%
matpower_ACTIVSg2000_sys-Radial network reduction First	449.6 ms	451.5 ms	+0.4%
matpower_ACTIVSg2000_sys-Radial network reduction Second	0.6 ms	0.8 ms	+17.6%
matpower_ACTIVSg2000_sys-Degree two network reduction First	1.665 s	1.674 s	+0.5%
matpower_ACTIVSg2000_sys-Degree two network reduction Second	1.1 ms	1.1 ms	-1.9%
Base_Eastern_Interconnect_515GW-Build Ybus First	3.373 s	3.298 s	-2.2%
Base_Eastern_Interconnect_515GW-Build Ybus Second	3.417 s	3.175 s	-7.1%
Base_Eastern_Interconnect_515GW-Radial network reduction First	1.29 s	40.8 ms	-96.8%
Base_Eastern_Interconnect_515GW-Radial network reduction Second	33.6 ms	40.3 ms	+19.9%
Base_Eastern_Interconnect_515GW-Degree two network reduction First	361.3 ms	346.6 ms	-4.1%
Base_Eastern_Interconnect_515GW-Degree two network reduction Second	45.4 ms	37.6 ms	-17.3%

Copilot

Pull request overview

This PR replaces the dependency on KLU.jl with an in-repo KLUWrapper built on SuiteSparse_jll and wires it through PTDF/LODF/MODF/Ward workflows, adding a worker-pool abstraction to support thread-safe parallel solves and new tests to validate correctness under concurrency.

Changes:

Introduce src/KLUWrapper/ with KLULinSolveCache (cached factorization) and KLULinSolvePool (thread-safe parallel solves) plus dense/sparse RHS solve helpers.
Refactor PTDF/LODF/Virtual* and Ward reduction codepaths to use klu_factorize, solve!, and solve_sparse!/solve_sparse instead of KLU.jl.
Add tests covering wrapper correctness, sparse RHS behavior, pool behavior, and threaded Virtual* access; bump package version and switch dependency to SuiteSparse_jll.

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
test/test_virtual_threaded.jl	Adds multithreaded regression tests for VirtualPTDF/VirtualLODF (single-task) and VirtualMODF (pool-backed parallelism).
test/test_klu_wrapper.jl	Adds comprehensive unit tests for cache/pool factorization, dense + sparse RHS solves, refactor/reset paths, and allocation expectations.
src/woodbury_kernel.jl	Refactors Woodbury computation into pure-data impl functions and adds dispatchers for VirtualPTDF vs pool-backed VirtualMODF.
src/ward_reduction.jl	Switches Ward reduction solves to `KLUWrapper` and uses sparse-RHS solve for boundary columns.
src/virtual_ptdf_calculations.jl	Updates VirtualPTDF to accept `KLULinSolveCache` and routes solves through the wrapper abstraction.
src/virtual_modf_calculations.jl	Makes VirtualMODF pool-backed for parallel `getindex`, adds locks around caches, and threads Woodbury calls through `with_worker`.
src/virtual_lodf_calculations.jl	Migrates VirtualLODF factorization/solves to `KLULinSolveCache` and uses in-place `solve!`.
src/ptdf_calculations.jl	Replaces dense RHS `KLU.solve!` usage with `solve_sparse!` over selected BA rows.
src/lodf_calculations.jl	Replaces dense RHS `KLU.solve!` usage with `solve_sparse!` for incidence-transpose RHS and updates denom solve.
src/PowerflowMatrixTypes.jl	Updates type aliases to use `KLULinSolveCache` and relaxes VirtualPTDF’s factorization type parameter.
src/PowerNetworkMatrices.jl	Includes/imports `KLUWrapper` APIs into the main module.
src/KLUWrapper/solve_sparse_rhs.jl	Implements sparse RHS packing + block-chunked solve to bound working set.
src/KLUWrapper/solve_dense.jl	Implements in-place dense solve and transpose solve; adds `\\` for allocating solve.
src/KLUWrapper/pool.jl	Adds `KLULinSolvePool` with `Channel`-based worker acquisition and refactor/reset logic.
src/KLUWrapper/klu_jll_bindings.jl	Adds low-level `ccall` bindings for SuiteSparse_long KLU entry points and error mapping.
src/KLUWrapper/klu_cache.jl	Implements `KLULinSolveCache` lifecycle, symbolic/numeric refactor, pattern checks, and scratch management.
src/KLUWrapper/KLUWrapper.jl	Defines the `KLUWrapper` module and exports wrapper APIs.
src/BA_ABA_matrices.jl	Updates ABA factorization storage and constructors to use `klu_factorize`.
Project.toml	Bumps version and replaces `KLU` dependency with `SuiteSparse_jll`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

luke-kiernan

One comment, will look more later.

jd-lara · 2026-05-04T17:09:32Z

There is still something odd with the windows version that fails in a non deterministic way.

Copilot

Pull request overview

Copilot reviewed 26 out of 27 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        Number of parallel workers in the underlying KLU pool. Defaults to
+        `_default_pool_workers()` — `max(1, Threads.nthreads() - 1)` on
+        Mac/Linux and `1` on Windows (where the KLU pool path is serialized
+        through `solver_lock` to work around a libklu thread-safety issue).


jd-lara · 2026-05-04T21:11:28Z

I've looked at the copilots comments and addressed the one that look real

luke-kiernan

I took a look at the backend multi-threading stuff--the matrix math itself I'm less familiar with.

luke-kiernan · 2026-05-04T22:26:16Z

+# outside the Virtual{PTDF, LODF, MODF} files so all three matrices share
+# the same `with_solver` seam and the same KLU/AppleAccelerate factory.
+#
+# Windows libklu thread-safety: the MinGW build of `SuiteSparse_jll`'s


It can also error in non multi-threaded situations on windows, too, for seemingly inexplicable reasons. Do you have a minimum reproducible example of that? The only 2 examples I know of are both quite complex.

We are still investigating what's going on in Windows. The latest exploration showed that the issue only happens in windows when the PTDF and the LODF are used. Still not a good explanation of what's going on.

luke-kiernan · 2026-05-04T22:41:12Z

+to test for a hit, runs `compute_row` outside the lock on a miss
+(KLU solves dominate the cost), then takes the lock again to insert. A
+concurrent producer that wins the insert race wins; the other side
+returns the winner's row. The signature places `compute_row` first so


The signature places compute_row first so callers can use do … end block syntax.

explain?

explain what?

Why does placing compute_row first matter for do...end block syntax?

luke-kiernan · 2026-05-05T00:01:40Z

+# snapshot is taken under `state_lock` for consistency; the `take!` loop
+# blocks outside the lock so concurrent `release!` calls can land.
+function _drain_available!(pool::KLULinSolvePool)
+    n_to_drain = @lock pool.state_lock count(pool.valid)


Nitpick: there's a function for this, n_valid(pool).

I'm also struggling to convince myself that there's no race condition here. Can a worker go from valid to invalid between the count call and the take! call? And if it can happen, does it cause issues?

If you'd like to verify empirically, flip const KLU_POOL_DEBUG = true in KLUWrapper.jl and re-run the threaded testsets in test_virtual_threaded.jl; the per-worker atomic collision counter in with_worker
(pool.jl:168-180) will log an @error if any task ever observes a worker held by another task concurrently.

Hmm. Relying on empirical tests feels weird, because compiler optimizations and undefined behavior. But if you're confident this is ok, I'll take your word for it.

luke-kiernan · 2026-05-05T00:09:42Z

+        # channel. Drain them so future `acquire!` calls consume valid
+        # indices instead.
+        _drain_sentinels!(pool)
+


After _drain_sentinels! and drain_sentinels!, should the pool be empty? I suspect so (in which case I'd add something to check said invariant)

I added more tests

claude and others added 8 commits April 25, 2026 17:54

make the changes usable

ab752d6

handle bad cases in the wrapper

5e685f5

update testing

af2af37

jd-lara requested review from Copilot and luke-kiernan April 28, 2026 00:15

Copilot started reviewing on behalf of jd-lara April 28, 2026 00:25 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

Comment thread src/KLUWrapper/pool.jl Outdated

Comment thread src/KLUWrapper/pool.jl Outdated

Comment thread src/KLUWrapper/pool.jl Outdated

luke-kiernan reviewed Apr 28, 2026

View reviewed changes

Comment thread src/KLUWrapper/klu_cache.jl Outdated

jd-lara added 16 commits April 27, 2026 22:03

improve pool safety

14b6644

extend usage to the other matrices

53c4c2b

fix the pool bug

22b9d06

fix performance degratation

a1f1c77

fix testing

60527c9

address luke's comment

fe4eef5

add more testing

76d41f0

improve the testing

dd8aa33

fix docs

903393d

add windows protection with gc

1424602

add more measurements in KLU

ee1dcb2

add more measurements

786209b

make windows serial

9698d65

do some clean up in the use of the solver

daf1781

extend use of the pool to other matrices

c45c096

make diagnostics optional

10c3d9b

jd-lara added 3 commits May 3, 2026 17:08

use the "solver" in the woodbury kernel

b20414a

fix tests

bfe8dfe

fix docstring issues

f8ab02b

jd-lara requested review from Copilot and luke-kiernan May 4, 2026 17:08

Copilot started reviewing on behalf of jd-lara May 4, 2026 17:09 View session

Copilot AI reviewed May 4, 2026

View reviewed changes

luke-kiernan requested changes May 5, 2026

View reviewed changes

jd-lara added 4 commits May 4, 2026 23:31

address PR comments

3eb99e4

implement other sentinel and invariance changes

2eb6f1f

add testing as requested in the PR comments

0b5ef78

add lock on windows

c717843

Conversation

jd-lara commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Results

Precompile Time

Execution Time

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

luke-kiernan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jd-lara commented May 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jd-lara commented May 4, 2026

Uh oh!

luke-kiernan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jd-lara May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions Bot commented Apr 28, 2026 •

edited

Loading

jd-lara May 5, 2026 •

edited

Loading