Conversation
Adds a small allocation-aware wrapper over libklu (provided via SuiteSparse_jll) tailored to the access patterns of this package: - KLULinSolveCache splits symbolic and numeric factorization, supports numeric_refactor! that reuses the analysis and the prior numeric struct, and is non-allocating after construction. - solve!/tsolve! call libklu's native dense multi-RHS routines directly. - solve_sparse!/solve_sparse provide a sparse-RHS path: B's columns are scattered into a small dense scratch in chunks instead of densifying the full N x M RHS up front. Has a skip_empty option for RHSs where most columns are structurally empty (Ward reduction, Woodbury kernel). - Real (Float64) and complex (ComplexF64) paths share the same struct via type-dispatched ccall helpers (klu_l_* and klu_zl_*). This replaces the use of KLU.jl elsewhere in the package. KLU.jl is removed from Project.toml; SuiteSparse_jll is added in its place. https://claude.ai/code/session_0128vLG5HzxrMTZYjE9oTGfd
Replaces KLU.KLUFactorization with the new KLULinSolveCache across the
matrix types and uses solve_sparse! for the multi-RHS, structurally
sparse code paths:
- ABA_Matrix.K is now KLULinSolveCache{Float64}; factorize/is_factorized
updated accordingly. DC_ABA_Matrix_Factorized type alias updated.
- _calculate_PTDF_matrix_KLU uses solve_sparse! on BA[valid_ix, :],
scattering the structurally-sparse RHS columns instead of densifying
the full (buscount-nref) x linecount matrix.
- _calculate_LODF_matrix_KLU passes transpose(a)[valid_ix, :] directly
to solve_sparse!, removing the original zeros(buscount, buscount)
intermediate.
- VirtualPTDF.K, VirtualLODF.K, VirtualMODF.K all hold a
KLULinSolveCache{Float64}; per-row solve! calls remain non-allocating.
- DC_vPTDF_Matrix is left unconstrained on K so AppleAccelerate
factorizations still satisfy the alias.
https://claude.ai/code/session_0128vLG5HzxrMTZYjE9oTGfd
Ward reduction: - The boundary-bus identity solve becomes a single solve_sparse! call on a sparse identity matrix instead of a per-bus loop with one RHS each. - The y_eq computation switches to solve_sparse(...; skip_empty=true) on y_eb. Most external buses are not adjacent to any boundary bus, so the RHS columns are largely empty and the skip-empty short-circuit avoids redundant solves. Tests: - Real and complex round-trip and refactor coverage. - solve_sparse! agrees with the dense-RHS path. - skip_empty produces zero columns for empty RHS columns. - solve_sparse! into a view writes only the targeted rows. https://claude.ai/code/session_0128vLG5HzxrMTZYjE9oTGfd
Simplifications: - Drop redundant n field from KLULinSolveCache; derive from colptr. - Drop _decrement!/_increment! helpers in favour of broadcast `.+= 1` / `.-= 1`. - Rename finalize! -> Base.finalize so it composes with stdlib instead of shadowing Base.finalize through the module's exports. - Drop solve_w_refinement (no in-tree caller); add Base.\\ on the cache for symmetry with KLU.KLUFactorization. - Unify the two tsolve! methods via a _tsolve_call helper that ignores the conjugate flag on the real path. - Drop block= and skip_empty= keywords from solve_sparse!. Always pack non-empty columns into a single dense scratch and dispatch one libklu multi-RHS solve. Empty RHS columns are zeroed in the output without a solve, in all cases. - Throw stdlib exception types (SingularException, OutOfMemoryError, ArgumentError, OverflowError) from klu_throw to match the conventions used in SparseArrays.CHOLMOD and elsewhere in PNM. - Refactor _calculate_PTDF_matrix_KLU: collapse the duplicated dist_slack branches into a single solve plus a slack-distribution post-step. - Strip migration-narrative comments from the consumer call sites. New: KLULinSolvePool - A small pool of independent KLULinSolveCache workers, each holding its own factorization of the same matrix. KLU's numeric struct and Common status field are mutated by klu_solve, so safe parallel use needs one factor copy per worker. - API: KLULinSolvePool(A; nworkers), with_worker(f, pool) -> result, acquire!/release!, numeric_refactor!(pool, A), Base.finalize. - Plumbing into VirtualMODF (per-worker scratch + cache locks) is staged for the next commit. https://claude.ai/code/session_0128vLG5HzxrMTZYjE9oTGfd
VirtualMODF.K is now a KLULinSolvePool{Float64} sized at construction time
to nworkers (defaults to Threads.nthreads()). Per-worker scratch buffers
(work_ba_col, temp_data) are stored as Vector{Vector{Float64}} indexed by
the worker handle returned from with_worker.
Woodbury kernel refactor:
- Split _compute_woodbury_factors / _apply_woodbury_correction into
pure-data _impl functions that take cache + scratch + data arrays
explicitly, plus mat-typed outer wrappers.
- VirtualPTDF wrapper uses its single shared scratch (existing single-cache
behavior).
- VirtualMODF wrapper uses with_worker to acquire a per-worker cache and
scratch.
Cache locking:
- Add ReentrantLocks for woodbury_cache and row_caches.
- _get_woodbury_factors and _get_or_create_row_cache use double-checked
locking so concurrent miss-then-fill is correct.
- clear_caches! / clear_all_caches! / Base.getindex acquire the locks.
Tests:
- New test/test_virtual_threaded.jl: uses Threads.@Spawn for VirtualPTDF,
VirtualLODF, and VirtualMODF. The VirtualMODF case spawns one task per
(monitored, contingency) work item and validates parallel results
match a serial reference. VirtualPTDF/VirtualLODF use a single-task
@Spawn since their K and scratch are still shared (not yet pool-backed);
the comments call this out explicitly.
- Tests in test/test_klu_wrapper.jl exercise KLULinSolvePool: basic
with_worker, concurrent solves via @threads on a 4-worker pool, and
numeric_refactor!.
https://claude.ai/code/session_0128vLG5HzxrMTZYjE9oTGfd
Performance ResultsPrecompile Time
Execution Time
|
There was a problem hiding this comment.
Pull request overview
This PR replaces the dependency on KLU.jl with an in-repo KLUWrapper built on SuiteSparse_jll and wires it through PTDF/LODF/MODF/Ward workflows, adding a worker-pool abstraction to support thread-safe parallel solves and new tests to validate correctness under concurrency.
Changes:
- Introduce
src/KLUWrapper/withKLULinSolveCache(cached factorization) andKLULinSolvePool(thread-safe parallel solves) plus dense/sparse RHS solve helpers. - Refactor PTDF/LODF/Virtual* and Ward reduction codepaths to use
klu_factorize,solve!, andsolve_sparse!/solve_sparseinstead ofKLU.jl. - Add tests covering wrapper correctness, sparse RHS behavior, pool behavior, and threaded Virtual* access; bump package version and switch dependency to
SuiteSparse_jll.
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/test_virtual_threaded.jl | Adds multithreaded regression tests for VirtualPTDF/VirtualLODF (single-task) and VirtualMODF (pool-backed parallelism). |
| test/test_klu_wrapper.jl | Adds comprehensive unit tests for cache/pool factorization, dense + sparse RHS solves, refactor/reset paths, and allocation expectations. |
| src/woodbury_kernel.jl | Refactors Woodbury computation into pure-data impl functions and adds dispatchers for VirtualPTDF vs pool-backed VirtualMODF. |
| src/ward_reduction.jl | Switches Ward reduction solves to KLUWrapper and uses sparse-RHS solve for boundary columns. |
| src/virtual_ptdf_calculations.jl | Updates VirtualPTDF to accept KLULinSolveCache and routes solves through the wrapper abstraction. |
| src/virtual_modf_calculations.jl | Makes VirtualMODF pool-backed for parallel getindex, adds locks around caches, and threads Woodbury calls through with_worker. |
| src/virtual_lodf_calculations.jl | Migrates VirtualLODF factorization/solves to KLULinSolveCache and uses in-place solve!. |
| src/ptdf_calculations.jl | Replaces dense RHS KLU.solve! usage with solve_sparse! over selected BA rows. |
| src/lodf_calculations.jl | Replaces dense RHS KLU.solve! usage with solve_sparse! for incidence-transpose RHS and updates denom solve. |
| src/PowerflowMatrixTypes.jl | Updates type aliases to use KLULinSolveCache and relaxes VirtualPTDF’s factorization type parameter. |
| src/PowerNetworkMatrices.jl | Includes/imports KLUWrapper APIs into the main module. |
| src/KLUWrapper/solve_sparse_rhs.jl | Implements sparse RHS packing + block-chunked solve to bound working set. |
| src/KLUWrapper/solve_dense.jl | Implements in-place dense solve and transpose solve; adds \\ for allocating solve. |
| src/KLUWrapper/pool.jl | Adds KLULinSolvePool with Channel-based worker acquisition and refactor/reset logic. |
| src/KLUWrapper/klu_jll_bindings.jl | Adds low-level ccall bindings for SuiteSparse_long KLU entry points and error mapping. |
| src/KLUWrapper/klu_cache.jl | Implements KLULinSolveCache lifecycle, symbolic/numeric refactor, pattern checks, and scratch management. |
| src/KLUWrapper/KLUWrapper.jl | Defines the KLUWrapper module and exports wrapper APIs. |
| src/BA_ABA_matrices.jl | Updates ABA factorization storage and constructors to use klu_factorize. |
| Project.toml | Bumps version and replaces KLU dependency with SuiteSparse_jll. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
luke-kiernan
left a comment
There was a problem hiding this comment.
One comment, will look more later.
|
There is still something odd with the windows version that fails in a non deterministic way. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 26 out of 27 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Number of parallel workers in the underlying KLU pool. Defaults to | ||
| `_default_pool_workers()` — `max(1, Threads.nthreads() - 1)` on | ||
| Mac/Linux and `1` on Windows (where the KLU pool path is serialized | ||
| through `solver_lock` to work around a libklu thread-safety issue). |
|
I've looked at the copilots comments and addressed the one that look real |
luke-kiernan
left a comment
There was a problem hiding this comment.
I took a look at the backend multi-threading stuff--the matrix math itself I'm less familiar with.
| # outside the Virtual{PTDF, LODF, MODF} files so all three matrices share | ||
| # the same `with_solver` seam and the same KLU/AppleAccelerate factory. | ||
| # | ||
| # Windows libklu thread-safety: the MinGW build of `SuiteSparse_jll`'s |
There was a problem hiding this comment.
It can also error in non multi-threaded situations on windows, too, for seemingly inexplicable reasons. Do you have a minimum reproducible example of that? The only 2 examples I know of are both quite complex.
There was a problem hiding this comment.
We are still investigating what's going on in Windows. The latest exploration showed that the issue only happens in windows when the PTDF and the LODF are used. Still not a good explanation of what's going on.
| to test for a hit, runs `compute_row` outside the lock on a miss | ||
| (KLU solves dominate the cost), then takes the lock again to insert. A | ||
| concurrent producer that wins the insert race wins; the other side | ||
| returns the winner's row. The signature places `compute_row` first so |
There was a problem hiding this comment.
The signature places
compute_rowfirst so callers can usedo … endblock syntax.
explain?
There was a problem hiding this comment.
Why does placing compute_row first matter for do...end block syntax?
| # snapshot is taken under `state_lock` for consistency; the `take!` loop | ||
| # blocks outside the lock so concurrent `release!` calls can land. | ||
| function _drain_available!(pool::KLULinSolvePool) | ||
| n_to_drain = @lock pool.state_lock count(pool.valid) |
There was a problem hiding this comment.
Nitpick: there's a function for this, n_valid(pool).
I'm also struggling to convince myself that there's no race condition here. Can a worker go from valid to invalid between the count call and the take! call? And if it can happen, does it cause issues?
There was a problem hiding this comment.
If you'd like to verify empirically, flip const KLU_POOL_DEBUG = true in KLUWrapper.jl and re-run the threaded testsets in test_virtual_threaded.jl; the per-worker atomic collision counter in with_worker
(pool.jl:168-180) will log an @error if any task ever observes a worker held by another task concurrently.
There was a problem hiding this comment.
Hmm. Relying on empirical tests feels weird, because compiler optimizations and undefined behavior. But if you're confident this is ok, I'll take your word for it.
| # channel. Drain them so future `acquire!` calls consume valid | ||
| # indices instead. | ||
| _drain_sentinels!(pool) | ||
|
|
There was a problem hiding this comment.
After _drain_sentinels! and drain_sentinels!, should the pool be empty? I suspect so (in which case I'd add something to check said invariant)
This PR is important to reduce issues with KLU allowing for multi entry