Manual sparse pattern for Jacobian and Hessian #348

PierreMartinon · 2025-01-31T15:05:39Z

Providing the block structure to ADNLPModels leads to less sparse matrices since there is no analysis of problem-specific functions, but is faster and requires less memory. This method scales better than using the forward/reverse optimized backend, and appears to be faster for large enough problems.

See test/docs/AD_backend.md

…omputed one dur to lagrange cost state

… matrix ?

PierreMartinon · 2025-01-31T15:10:29Z

The relevant code in solve.jl

    # objective and constraints functions
    f = x -> DOCP_objective(x, docp)
    c! = (c, x) -> DOCP_constraints!(c, x, docp)

    # sparsity pattern
    J_backend = ADNLPModels.SparseADJacobian(docp.dim_NLP_variables, f, docp.dim_NLP_constraints, c!, DOCP_Jacobian_pattern(docp))
    H_backend = ADNLPModels.SparseReverseADHessian(docp.dim_NLP_variables, f, docp.dim_NLP_constraints, c!, DOCP_Hessian_pattern(docp))

    # call NLP problem constructor
    if adnlp_backend == :manual
        nlp = ADNLPModel!(
        f, x0, docp.var_l, docp.var_u,
        c!, docp.con_l, docp.con_u,
        gradient_backend = ADNLPModels.ReverseDiffADGradient,
        hprod_backend = ADNLPModels.ReverseDiffADHvprod,
        jtprod_backend = ADNLPModels.ReverseDiffADJtprod,
        jacobian_backend = J_backend,
        hessian_backend = H_backend,
        show_time = show_time
    )
    else
        nlp = ADNLPModel!(
            x -> DOCP_objective(x, docp), x0, docp.var_l, docp.var_u,
            (c, x) -> DOCP_constraints!(c, x, docp), docp.con_l, docp.con_u,
            backend = adnlp_backend, show_time = show_time
            )
    end

with the sparse structure for the Jacobian/Hessian in disc/trapeze.jl disc/midpoint.jl and disc/irk.jl

amontoison · 2025-02-01T00:34:01Z

test/docs/AD_backend.md

- :zygote : Zygote
+# Benchmark for different AD backends
+The backend for ADNLPModels can be set in transcription / solve calls with the option `adnlp_backend=`. Possible values include the predefined(*) backends for ADNLPModels:
+- `:optimized`* Default for CTDirect. Forward mode for Jacobian, reverse for Gradient and Hessian.


Suggested change

- `:optimized`* Default for CTDirect. Forward mode for Jacobian, reverse for Gradient and Hessian.

- `:optimized`* Default for CTDirect. Forward mode for Jacobian, reverse for Gradient and forward over reverse for Hessian.

amontoison · 2025-02-01T00:35:04Z

test/docs/AD_backend.md

+```
+
+Takeaways:
+- the `:optimized` backend (with reverse mode for Hessian) is much better than full forward mode.


Suggested change

- the `:optimized` backend (with reverse mode for Hessian) is much better than full forward mode.

- the `:optimized` backend (with forward over reverse mode for Hessian) is much better than full forward mode.

amontoison · 2025-02-01T00:38:53Z

test/docs/AD_backend.md

+
+Takeaways:
+- the `:optimized` backend (with reverse mode for Hessian) is much better than full forward mode.
+- manual sparse pattern seems to give even better performance for larger problems. This is likely due to the increasing cost of computing the Hessian sparsity in terms of allocations and time. This observation is consistent with the comparison with Jump that seems to use a different, less sparse but faster method for the Hessian.


Suggested change

- manual sparse pattern seems to give even better performance for larger problems. This is likely due to the increasing cost of computing the Hessian sparsity in terms of allocations and time. This observation is consistent with the comparison with Jump that seems to use a different, less sparse but faster method for the Hessian.

- manual sparse pattern seems to give even better performance for larger problems. This is likely due to the increasing cost of computing the Hessian sparsity with SparseConnectivityTracer.jl in terms of allocations and time.

This observation is consistent with the comparison with JuMP that seems to use a different, less sparse but faster method for the Hessian.

The sparsity pattern detection in JuMP relies on the expression tree of the objective and constraints built from its DSL.

amontoison · 2025-02-01T00:42:52Z

test/docs/AD_backend.md

+- redo tests on algal_bacterial problem, including Jump
+- add some tests for different backends in test_misc
+- try to disable some unused (?) parts such as hprod ? (according to show_time info the impact may be small)
+- reuse ADNLPModels functions to get block sparsity patterns then rebuild full patterns ?


I recently added these functions for you:
https://jso.dev/ADNLPModels.jl/dev/sparse/#Extracting-sparsity-patterns

Edit: yes, thanks, these were quite useful to spot missing nonzeros in the manual pattern !

amontoison · 2025-02-01T00:47:08Z

src/disc/trapeze.jl

+function DOCP_Hessian_pattern(docp::DOCP{Trapeze})
+
+    # NB. need to provide full pattern for coloring, not just upper/lower part
+    H = zeros(Bool, docp.dim_NLP_variables, docp.dim_NLP_variables)


Suggested change

H = zeros(Bool, docp.dim_NLP_variables, docp.dim_NLP_variables)

H = BitMatrix(undef, docp.dim_NLP_variables, docp.dim_NLP_variables)

fill!(H, false)

It should reduce the allocations by a factor 8 😃
-> Bool = 1 byte / octet = 8 bits.

@amontoison Nice, I did not know a Bool would take a full byte ! In the meantime I switched to the vector format (Is,Js,Vs) so there is no more matrix allocation. Do you think computing the number of nonzero and allocating the vectors directly at full size would be noticeably better than allocating at 0 size and using push! to add elements ?

Using push! is slower because it involves dynamic allocations. I expect allocating the BitMatrix at the beginning to speed up the operations.

amontoison · 2025-02-01T00:48:15Z

src/disc/trapeze.jl

+
+function DOCP_Jacobian_pattern(docp::DOCP{Trapeze})
+
+    J = zeros(Bool, docp.dim_NLP_constraints, docp.dim_NLP_variables)


Suggested change

J = zeros(Bool, docp.dim_NLP_constraints, docp.dim_NLP_variables)

J = BitMatrix(undef, docp.dim_NLP_constraints, docp.dim_NLP_variables)

fill!(J, false)

amontoison · 2025-02-01T00:56:33Z

src/solve.jl

+        jtprod_backend = ADNLPModels.ReverseDiffADJtprod,
+        jacobian_backend = J_backend,
+        hessian_backend = H_backend,
+        show_time = show_time


If you don't use the following backends:

hprod

jprod

jtprod

ghjvprod

I recommend to set them as an EmptyBackend.

I should add an easy way for the user to provide list the backends that he needs in ADNLPModels.jl (JuliaSmoothOptimizers/ADNLPModels.jl#324).

amontoison · 2025-02-01T01:05:33Z

test/docs/AD_backend.md

-| 1000    | 926.0   | 7.1       |
-| 2500    |         | 31.8      |
-| 5000    |         |           |
+*** building the hessian is one third of the total solve time...


I will pass show_time to the constructor of the sparse Jacobians and Hessians such that you can know how much time is spend in the sparsity detection (JuliaSmoothOptimizers/ADNLPModels.jl#325).

amontoison · 2025-02-01T01:11:09Z

@PierreMartinon How do you determine the sparsity pattern on the objective?
Is it not a function provided by the user?
What we need in MadNLP / IPOPT / KNITRO is the sparsity pattern of the Hessian of Lagrangian L(x,y) = f(x) + y'c(x) which is the the sparsity pattern of the Hessian of the objective combined with the sparsity pattern of the Hessian of all constraints.

amontoison · 2025-02-01T01:44:26Z

test/docs/AD_backend.md

+- `:enzyme`* Enzyme (not working).
+- `:zygote`* Zygote (not working).


For Enzyme.jl, Michel is working on it:
JuliaSmoothOptimizers/ADNLPModels.jl#322

For Zygote.jl, I will probably remove the support in the next major release because we only have dense AD backends for this package and the Hessian is only for the objective...
We can't easily support the Hessian of the Lagrangian, which is what you need here.

Ok, noted. I did not push the tests far for these two, I basically just passed the option to see what would happen.

amontoison · 2025-02-01T03:14:30Z

cc @jbcaillau

jbcaillau · 2025-02-01T09:14:25Z

@PierreMartinon very nice 👍🏽👍🏽👍🏽

Regarding the tests, do we agree that apart from the current issue with artifacts, there are only problems with tests involving other RK schemes than trapezoidal (:midpoint, e.g.)?

The case being, since the default trapezoidal scheme is able to treat 99% use cases, is it OK to set the default as being trapezoidal + manual sparsity, and switch to automatic for other schemes?

jbcaillau · 2025-02-01T13:38:40Z

@amontoison the gradient of the objective is expected to be very sparse. a typical Mayer program (in which, internally, every problem is tranformed AFAIR - @PierreMartinon ?) has a cost only depending on the value of the state on the last grid point; for (n + m) * N unknowns, that's only n nonzeros.

@PierreMartinon How do you determine the sparsity pattern on the objective? Is it not a function provided by the user? What we need in MadNLP / IPOPT / KNITRO is the sparsity pattern of the Hessian of Lagrangian L(x,y) = f(x) + y'c(x) which is the the sparsity pattern of the Hessian of the objective combined with the sparsity pattern of the Hessian of all constraints.

Yes mostly, see below.

PierreMartinon · 2025-02-06T10:03:08Z

@PierreMartinon How do you determine the sparsity pattern on the objective? Is it not a function provided by the user? What we need in MadNLP / IPOPT / KNITRO is the sparsity pattern of the Hessian of Lagrangian L(x,y) = f(x) + y'c(x) which is the the sparsity pattern of the Hessian of the objective combined with the sparsity pattern of the Hessian of all constraints.

Indeed, the 'manual' mode for sparsity pattern is a crude one that makes no assumption about the problem's specific function, ie it assumes that all functions involved have 'full' (nonzero) derivatives. What it does is eliminate the blocks that we know for sure are zero terms. For the objective, we have 2 possible cases:

mayer cost f(x) that is more precisely of the form g0(x0,xf,v). In the Hessian this leads to nnz blocks at locations (x0,x0), (xf,xf), (v,v) and the 6 cross derivative terms. Of course in practice a significant part of the values will actually be zeros, so we waste some sparsity !
lagrange cost: this one is simply of the form f(x) = x_i, so second derivatives are zero.

The resulting code is rather ugly, although I tried to improve the readability.

"""
$(TYPEDSIGNATURES)

Build sparsity pattern for Hessian of Lagrangian
"""
function DOCP_Hessian_pattern(docp::DOCP{Trapeze})

    # NB. need to provide full pattern for coloring, not just upper/lower part
    Is = Vector{Int}(undef, 0)
    Js = Vector{Int}(undef, 0)

    # index alias for v
    v_start = docp.dim_NLP_variables - docp.dim_NLP_v + 1
    v_end = docp.dim_NLP_variables

    # 0. objective
    # 0.1 mayer cost (x0, xf, v) 
    # -> grouped with term 3. for boundary conditions
    # 0.2 lagrange case (lf)
    # -> 2nd order term is zero
   
    # 1. main loop over steps
    # 1.0 v / v term
    add_nonzero_block!(Is, Js, v_start, v_end, v_start, v_end)

    for i = 1:docp.dim_NLP_steps

        # contiguous variables blocks will be used when possible
        # x_i (l_i) u_i x_i+1 (l_i+1) u_i+1
        var_block = docp.discretization._step_variables_block * 2
        var_offset = (i-1)*docp.discretization._step_variables_block

        # 1.1 state eq 0 = x_i+1 - (x_i + h_i/2 (f(t_i,x_i,u_i,v) + f(t_i+1,x_i+1,u_i+1,v)))
        # depends on x_i, u_i, x_i+1, u_i+1, and v -> included in 1.2
        # 1.2 lagrange part 0 = l_i+1 - (l_i + h_i/2 (l(t_i,x_i,u_i,v) + l(t_i+1,x_i+1,u_i+1,v)))
        # depends on x_i, l_i, u_i, x_i+1, l_i+1, u_i+1, and v
        # -> use single block for all step variables
        add_nonzero_block!(Is, Js, var_offset+1, var_offset+var_block, var_offset+1, var_offset+var_block)
        add_nonzero_block!(Is, Js, var_offset+1, var_offset+var_block, v_start, v_end; sym=true)

        # 1.3 path constraint g(t_i, x_i, u_i, v)
        # -> included in 1.2
    end

    # 2. final path constraints (xf, uf, v)
    # -> included in last loop iteration

    # 3. boundary constraints (x0, xf, v)
    # -> (x0, v) terms included in first loop iteration
    # -> (xf, v) terms included in last loop iteration
    if docp.is_mayer || docp.dim_boundary_cons > 0
        var_offset = docp.dim_NLP_steps*docp.discretization._step_variables_block
        x0_start = 1
        x0_end = docp.dim_OCP_x
        xf_start = var_offset + 1
        xf_end = var_offset + docp.dim_OCP_x
        add_nonzero_block!(Is, Js, x0_start, x0_end, xf_start, xf_end; sym=true)
    end
    # 3.1 null initial condition for lagrangian cost state l0
    # -> 2nd order term is zero

    # build and return sparse matrix
    nnzj = length(Is)
    Vs = ones(Bool, nnzj)
    return sparse(Is, Js, Vs, docp.dim_NLP_variables, docp.dim_NLP_variables)

end

with

function add_nonzero_block!(Is, Js, i_start, i_end, j_start, j_end; sym=false)
    for i=i_start:i_end
        for j=j_start:j_end
            push!(Is, i)
            push!(Js, j)
            sym && push!(Is, j)
            sym && push!(Js, i)
        end
    end
    return
end

We can see the limitations of this approach as the J/H patterns are less dense than the ones computed with the usual backend. On the other hand the startup cost is much smaller. All in all this seems to be a kind of tradeoff between the time taken by building the derivatives vs the time spent in the optimization itself (iterations number and also derivatives evaluations). I expect for instance the 'manual' derivatives to be more expensive to evaluate since they are less sparse.

A potential improvement would be to determine the sparsity patterns of the problem's functions, and use them to fill the Jacobian / Hessian patterns, instead of putting full dense blocks.

Oh, I should try a BitVector for Vs :D
edit: meh, using trues(nnz) give a very small reduction in allocations and is not faster (apparently some operations may perform better on Vector{Bool} than BitArray). I'll keep it simple for now.

PierreMartinon · 2025-02-06T10:08:16Z

@PierreMartinon some errors in CI and doc (not taking into account breakages, due to artefact update, I suppose), changed to draft.

Seems to be fine now :-)

PierreMartinon · 2025-02-06T10:23:28Z

@PierreMartinon very nice 👍🏽👍🏽👍🏽

Regarding the tests, do we agree that apart from the current issue with artifacts, there are only problems with tests involving other RK schemes than trapezoidal (:midpoint, e.g.)?

The case being, since the default trapezoidal scheme is able to treat 99% use cases, is it OK to set the default as being trapezoidal + manual sparsity, and switch to automatic for other schemes?

I think we can do a bit more testing but basically yes, something like that for the moment. Doing midpoint after trapeze and IRK should not take long, I'll get started.
Edit: midpoint done.

The harder part will be to try to improve the manual mode that currently has quite a bit of excess nonzeros elements.

A first direction is purely to use the finest possible granularity for the 'full' nnz blocks: for instance, in a lagrange problem we add a final component in the state for the integral of the running cost, which is never passed to the original OCP functions. So we know that the derivatives wrt these particular variables are zero, however it can be a bit involved to 'skip' these. This is more involved in the IRK schemes where we have the stage variables and equations that may also include this additional component. Another exemple, the time steps h_i depend on v in the variable time case, but not in the fixed time case... Ideally we would add nnz blocks down to each individual variable present in each constraints / functions.

A second direction is to exploit the sparsity from the OCP specific functions instead of assuming full nnz blocks. Here we would need to apply the AD on these functions and reinject the sub-patterns into the Jacobian / Hessian somehow. Probably the most effective, but will require some work since I never tried that ;-)

jbcaillau · 2025-02-10T09:56:27Z

@PierreMartinon some errors in CI and doc (not taking into account breakages, due to artefact update, I suppose), changed to draft.

Seems to be fine now :-)

anyone did sth? @ocots aware of that?

jbcaillau · 2025-02-12T09:16:56Z

@PierreMartinon @ocots some breakage tests seem to pass, some not. looks like there is a new error there:

any clue?

@PierreMartinon some errors in CI and doc (not taking into account breakages, due to artefact update, I suppose), changed to draft.

Seems to be fine now :-)

anyone did sth? @ocots aware of that?

ocots · 2025-02-12T11:22:53Z

Don't know but it must not work with the v4 without changing the code of the action.

jbcaillau · 2025-02-12T22:19:44Z

@PierreMartinon ready to merge in spite of the (broken 😬) breakages above? if yes, should close #183

PierreMartinon · 2025-02-13T10:50:55Z

@PierreMartinon ready to merge in spite of the (broken 😬) breakages above? if yes, should close #183

Breakage task only checks if OptimalControl can work with the current code, and is useful mostly when preparing releases. Also, the Breakage task is kind of broken now ;-)

The actual tests for CTDirect are the CI ones (currently failing, I'm looking into it. edit: looks like some overzealous type qualification for the ocp description :D).

After this one is done I'll start preparing release 0.13.1, as well as the update for the tutorials / docs in OptimalControl (discretizations in particular, and probably a few words on the AD backends)

PierreMartinon added 20 commits December 16, 2024 11:20

quick bench

2d2a3de

moved pics

a07940d

try to give jacobian sparse pattern

2f4493b

probable bug in jac pattern

75293ca

solve ok for simple integrator but given patter is a bit worse than c…

de6dbc6

…omputed one dur to lagrange cost state

bench ok

7ad48c8

improved jacobian pattern a bit (lagrange state)

d7e25a1

try hessian

bd177d1

disable hessian for tests

94a5a23

use bool matrix for sparsity pattern generation

0b6d54c

try to build sparse jac form indexes and avlues sets instead of dense…

700d3c2

… matrix ?

sync with main

3515365

hessian pattern (check cross terms)

ce22595

goddard_all ok with manual sparse pattern

318df04

bench for manual sparse patterns

f6c17ff

more tests

fc9b156

removed robbins from benchmark; updated markdown

a6b92e4

updated markdown

0bbbd88

updated markdown

ff362aa

markdown

5f46cef

github-actions bot assigned PierreMartinon Jan 31, 2025

removed no_hessian option

c3c3581

amontoison reviewed Feb 1, 2025

View reviewed changes

todo: midpoint and irk

2da8241

PierreMartinon added 5 commits February 5, 2025 13:35

vector format for sparse matrix building (trapeze)

2f396c3

index vectors for IRK

3684b27

todo: debug jacobian for irk

a65f75e

bench ok for GL2 manual

357984f

ok

9b1e479

PierreMartinon marked this pull request as ready for review February 6, 2025 10:05

PierreMartinon added 7 commits February 6, 2025 14:02

bench midpoint ok

7f924a4

cleaned jacobian functions

562e679

code and markdown cleanup

ae477cc

md

f6b95fe

jump

2485299

disable unused AD backends

6a2b07e

update tests

ae63580

tests update

e4a5297

jbcaillau mentioned this pull request Feb 12, 2025

Artifact pb with breakages control-toolbox/CTActions#11

Closed

PierreMartinon mentioned this pull request Feb 12, 2025

Pass sparse structure of constraints to NLPModels #183

Closed

PierreMartinon added 2 commits February 12, 2025 17:48

ready for merge

a67c5e7

index

01a92e1

api update

d97f1e3

removed type qualif for OCP description in direct_transcription

afb212c

PierreMartinon merged commit f7cd141 into main Feb 13, 2025
6 of 7 checks passed

PierreMartinon deleted the sparse_pattern branch February 13, 2025 11:37

	- `:optimized`* Default for CTDirect. Forward mode for Jacobian, reverse for Gradient and Hessian.
	- `:optimized`* Default for CTDirect. Forward mode for Jacobian, reverse for Gradient and forward over reverse for Hessian.

	- the `:optimized` backend (with reverse mode for Hessian) is much better than full forward mode.
	- the `:optimized` backend (with forward over reverse mode for Hessian) is much better than full forward mode.

-- manual sparse pattern seems to give even better performance for larger problems. This is likely due to the increasing cost of computing the Hessian sparsity in terms of allocations and time. This observation is consistent with the comparison with Jump that seems to use a different, less sparse but faster method for the Hessian.
+- manual sparse pattern seems to give even better performance for larger problems. This is likely due to the increasing cost of computing the Hessian sparsity with SparseConnectivityTracer.jl in terms of allocations and time.
+This observation is consistent with the comparison with JuMP that seems to use a different, less sparse but faster method for the Hessian.
+The sparsity pattern detection in JuMP relies on the expression tree of the objective and constraints built from its DSL.

	H = zeros(Bool, docp.dim_NLP_variables, docp.dim_NLP_variables)
	H = BitMatrix(undef, docp.dim_NLP_variables, docp.dim_NLP_variables)
	fill!(H, false)


		function DOCP_Jacobian_pattern(docp::DOCP{Trapeze})

		J = zeros(Bool, docp.dim_NLP_constraints, docp.dim_NLP_variables)

	J = zeros(Bool, docp.dim_NLP_constraints, docp.dim_NLP_variables)
	J = BitMatrix(undef, docp.dim_NLP_constraints, docp.dim_NLP_variables)
	fill!(J, false)

		- `:enzyme`* Enzyme (not working).
		- `:zygote`* Zygote (not working).

Manual sparse pattern for Jacobian and Hessian #348

Manual sparse pattern for Jacobian and Hessian #348

Uh oh!

Conversation

PierreMartinon commented Jan 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PierreMartinon commented Jan 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PierreMartinon Feb 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amontoison commented Feb 1, 2025

Uh oh!

amontoison Feb 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amontoison commented Feb 1, 2025

Uh oh!

jbcaillau commented Feb 1, 2025

Uh oh!

jbcaillau commented Feb 1, 2025 • edited by PierreMartinon Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PierreMartinon commented Feb 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PierreMartinon commented Feb 6, 2025

Uh oh!

PierreMartinon commented Feb 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbcaillau commented Feb 10, 2025

Uh oh!

jbcaillau commented Feb 12, 2025

Uh oh!

ocots commented Feb 12, 2025

Uh oh!

jbcaillau commented Feb 12, 2025

Uh oh!

PierreMartinon commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

PierreMartinon commented Jan 31, 2025 •

edited

Loading

PierreMartinon commented Jan 31, 2025 •

edited

Loading

PierreMartinon Feb 6, 2025 •

edited

Loading

amontoison Feb 1, 2025 •

edited

Loading

jbcaillau commented Feb 1, 2025 •

edited by PierreMartinon

Loading

PierreMartinon commented Feb 6, 2025 •

edited

Loading

PierreMartinon commented Feb 6, 2025 •

edited

Loading

PierreMartinon commented Feb 13, 2025 •

edited

Loading