-
Notifications
You must be signed in to change notification settings - Fork 231
Remove Optim.jl interface + minor tidying up of src/optimisation/Optimisation #2708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6409018 to
02b1f48
Compare
|
Turing.jl documentation for PR #2708 is available at: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked and the base optimisation tests are a superset of these tests, so no need to incorporate any of these deleted tests into the main suite.
| function OptimLogDensity( | ||
| model::DynamicPPL.Model, | ||
| getlogdensity::Function, | ||
| vi::DynamicPPL.AbstractVarInfo; | ||
| adtype::ADTypes.AbstractADType=Turing.DEFAULT_ADTYPE, | ||
| ) | ||
| ldf = DynamicPPL.LogDensityFunction(model, getlogdensity, vi; adtype=adtype) | ||
| return new{typeof(ldf)}(ldf) | ||
| end | ||
| function OptimLogDensity( | ||
| model::DynamicPPL.Model, | ||
| getlogdensity::Function; | ||
| adtype::ADTypes.AbstractADType=Turing.DEFAULT_ADTYPE, | ||
| ) | ||
| # No varinfo | ||
| return OptimLogDensity( | ||
| model, | ||
| getlogdensity, | ||
| DynamicPPL.ldf_default_varinfo(model, getlogdensity); | ||
| adtype=adtype, | ||
| ) | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just duplicating LogDensityFunction code. I thought it simpler to just construct the LDF and then wrap it.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## breaking #2708 +/- ##
============================================
- Coverage 86.86% 86.33% -0.54%
============================================
Files 22 21 -1
Lines 1447 1383 -64
============================================
- Hits 1257 1194 -63
+ Misses 190 189 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
|
||
| ## Optimisation interface | ||
|
|
||
| The Optim.jl interface has been removed (so you cannot call `Optim.optimize` directly on Turing models). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest that we define our own optimize function on top of DynamicPPL.LogDensityFunction to avoid the breaking change. The optimize function provides a natural alternative to AbstractMCMC.sample.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really agree:
-
What is the purpose of avoiding the breaking change? The functionality is already present in
maximum_likelihood(model)ormaximum_a_posteriori(model)so this is just duplicate functionality. -
A
Turing.optimizefunction would still be a breaking change because this is not the same asOptim.optimize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Penny, for clarifying.
My main point is not breaking change, instead optimise mirrors the Turing.sample API well, and provides a generic interface for optimisation, and VI algorithms. Sorry for the confusion. Though I agree we ought to remove the Optim interface and define our own optimize method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, in that case we could start by renaming estimate_mode to optimize, and exporting it. That would give us out of the box
optimize(model, MLE(); kwargs...)
optimize(model, MAP(); kwargs...)and then the VI arguments could be bundled into a struct that would be passed as the second argument.
33ecb77 to
6448372
Compare
fa46a89 to
9068a70
Compare
|
|
||
| log_density = OptimLogDensity(model, getlogdensity, vi) | ||
| # Note that we don't need adtype here, because it's specified inside the | ||
| # OptimizationProblem | ||
| ldf = DynamicPPL.LogDensityFunction(model, getlogdensity, vi) | ||
| log_density = OptimLogDensity(ldf) | ||
|
|
||
| prob = Optimization.OptimizationProblem(log_density, adtype, constraints) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually unsure if this is 'best practice', although I've kept it this way since that's how it was written prior to this PR. What this effectively does is to tell Optimization.jl to use adtype to differentiate through the LogDensityFunction. It seems to me that it would be more appropriate to construct an LDF with the appropriate adtype, and then tell OptimizationProblem to use the implementation of logdensity_and_gradient when it needs a gradient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That does feel like a better approach. Looks like OptimizationFunction does support both ways of doing it, though I don't see a way to make use of logdensity_and_gradient also computing the log density: https://docs.sciml.ai/Optimization/v4.8/API/optimization_function/#optfunction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm, but if it then wants something like Hessians (for, er, ... full Newton's method?) then we can't provide that, unless we rewrite OptimLogDensity to also cache hessian preparation and stuff. Maybe best to just let Optimization do its thing then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very slightly disappointing, but I suppose also it's probably quite unlikely that this would be a performance pain point (compared to the actual cost of evaluating the gradient), so happy to drop it.
| ModeEstimator | ||
| An abstract type to mark whether mode estimation is to be done with maximum a posteriori | ||
| (MAP) or maximum likelihood estimation (MLE). This is only needed for the Optim.jl interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by the comment that is being removed. Wouldn't we want these types to exist regardless? Or was the intention of the comment to say that we should move to using distinct functions for MLE and MAP and not have this type? I appreciate you didn't write the comment that is being removed, and for extra irony, I may have written it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I was confused too. I actually started this PR by deleting them! 😄 then I realised we can't delete them. No big deal, this PR will fix the confusion...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I think in general these types are good to have, I'm quite happy to have these kind of enums)
|
|
||
| log_density = OptimLogDensity(model, getlogdensity, vi) | ||
| # Note that we don't need adtype here, because it's specified inside the | ||
| # OptimizationProblem | ||
| ldf = DynamicPPL.LogDensityFunction(model, getlogdensity, vi) | ||
| log_density = OptimLogDensity(ldf) | ||
|
|
||
| prob = Optimization.OptimizationProblem(log_density, adtype, constraints) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That does feel like a better approach. Looks like OptimizationFunction does support both ways of doing it, though I don't see a way to make use of logdensity_and_gradient also computing the log density: https://docs.sciml.ai/Optimization/v4.8/API/optimization_function/#optfunction
notes in comments.
Further refactoring will happen, but in separate PRs.
Closes #2635.