Skip to content

Conversation

@penelopeysm
Copy link
Member

@penelopeysm penelopeysm commented Nov 3, 2025

notes in comments.

Further refactoring will happen, but in separate PRs.

Closes #2635.

@penelopeysm penelopeysm changed the base branch from main to breaking November 3, 2025 10:40
@github-actions
Copy link
Contributor

github-actions bot commented Nov 3, 2025

Turing.jl documentation for PR #2708 is available at:
https://TuringLang.github.io/Turing.jl/previews/PR2708/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked and the base optimisation tests are a superset of these tests, so no need to incorporate any of these deleted tests into the main suite.

Comment on lines -89 to -111
function OptimLogDensity(
model::DynamicPPL.Model,
getlogdensity::Function,
vi::DynamicPPL.AbstractVarInfo;
adtype::ADTypes.AbstractADType=Turing.DEFAULT_ADTYPE,
)
ldf = DynamicPPL.LogDensityFunction(model, getlogdensity, vi; adtype=adtype)
return new{typeof(ldf)}(ldf)
end
function OptimLogDensity(
model::DynamicPPL.Model,
getlogdensity::Function;
adtype::ADTypes.AbstractADType=Turing.DEFAULT_ADTYPE,
)
# No varinfo
return OptimLogDensity(
model,
getlogdensity,
DynamicPPL.ldf_default_varinfo(model, getlogdensity);
adtype=adtype,
)
end
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just duplicating LogDensityFunction code. I thought it simpler to just construct the LDF and then wrap it.

@codecov
Copy link

codecov bot commented Nov 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.33%. Comparing base (b59947a) to head (9068a70).
⚠️ Report is 1 commits behind head on breaking.

Additional details and impacted files
@@             Coverage Diff              @@
##           breaking    #2708      +/-   ##
============================================
- Coverage     86.86%   86.33%   -0.54%     
============================================
  Files            22       21       -1     
  Lines          1447     1383      -64     
============================================
- Hits           1257     1194      -63     
+ Misses          190      189       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.


## Optimisation interface

The Optim.jl interface has been removed (so you cannot call `Optim.optimize` directly on Turing models).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that we define our own optimize function on top of DynamicPPL.LogDensityFunction to avoid the breaking change. The optimize function provides a natural alternative to AbstractMCMC.sample.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really agree:

  1. What is the purpose of avoiding the breaking change? The functionality is already present in maximum_likelihood(model) or maximum_a_posteriori(model) so this is just duplicate functionality.

  2. A Turing.optimize function would still be a breaking change because this is not the same as Optim.optimize.

Copy link
Member

@yebai yebai Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Penny, for clarifying.

My main point is not breaking change, instead optimise mirrors the Turing.sample API well, and provides a generic interface for optimisation, and VI algorithms. Sorry for the confusion. Though I agree we ought to remove the Optim interface and define our own optimize method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, in that case we could start by renaming estimate_mode to optimize, and exporting it. That would give us out of the box

optimize(model, MLE(); kwargs...)
optimize(model, MAP(); kwargs...)

and then the VI arguments could be bundled into a struct that would be passed as the second argument.

@penelopeysm penelopeysm force-pushed the py/no-optim branch 2 times, most recently from 33ecb77 to 6448372 Compare November 7, 2025 22:06
Comment on lines -571 to 511

log_density = OptimLogDensity(model, getlogdensity, vi)
# Note that we don't need adtype here, because it's specified inside the
# OptimizationProblem
ldf = DynamicPPL.LogDensityFunction(model, getlogdensity, vi)
log_density = OptimLogDensity(ldf)

prob = Optimization.OptimizationProblem(log_density, adtype, constraints)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually unsure if this is 'best practice', although I've kept it this way since that's how it was written prior to this PR. What this effectively does is to tell Optimization.jl to use adtype to differentiate through the LogDensityFunction. It seems to me that it would be more appropriate to construct an LDF with the appropriate adtype, and then tell OptimizationProblem to use the implementation of logdensity_and_gradient when it needs a gradient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does feel like a better approach. Looks like OptimizationFunction does support both ways of doing it, though I don't see a way to make use of logdensity_and_gradient also computing the log density: https://docs.sciml.ai/Optimization/v4.8/API/optimization_function/#optfunction

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm, but if it then wants something like Hessians (for, er, ... full Newton's method?) then we can't provide that, unless we rewrite OptimLogDensity to also cache hessian preparation and stuff. Maybe best to just let Optimization do its thing then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very slightly disappointing, but I suppose also it's probably quite unlikely that this would be a performance pain point (compared to the actual cost of evaluating the gradient), so happy to drop it.

@penelopeysm penelopeysm requested a review from mhauru December 1, 2025 09:39
ModeEstimator
An abstract type to mark whether mode estimation is to be done with maximum a posteriori
(MAP) or maximum likelihood estimation (MLE). This is only needed for the Optim.jl interface.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by the comment that is being removed. Wouldn't we want these types to exist regardless? Or was the intention of the comment to say that we should move to using distinct functions for MLE and MAP and not have this type? I appreciate you didn't write the comment that is being removed, and for extra irony, I may have written it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I was confused too. I actually started this PR by deleting them! 😄 then I realised we can't delete them. No big deal, this PR will fix the confusion...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think in general these types are good to have, I'm quite happy to have these kind of enums)

Comment on lines -571 to 511

log_density = OptimLogDensity(model, getlogdensity, vi)
# Note that we don't need adtype here, because it's specified inside the
# OptimizationProblem
ldf = DynamicPPL.LogDensityFunction(model, getlogdensity, vi)
log_density = OptimLogDensity(ldf)

prob = Optimization.OptimizationProblem(log_density, adtype, constraints)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does feel like a better approach. Looks like OptimizationFunction does support both ways of doing it, though I don't see a way to make use of logdensity_and_gradient also computing the log density: https://docs.sciml.ai/Optimization/v4.8/API/optimization_function/#optfunction

@penelopeysm penelopeysm merged commit 1d10d2d into breaking Dec 1, 2025
26 of 28 checks passed
@penelopeysm penelopeysm deleted the py/no-optim branch December 1, 2025 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants