stan-dev · SteveBronder · Feb 11, 2026 · Feb 18, 2026 · Feb 18, 2026 · Mar 13, 2026
diff --git a/src/functions-reference/embedded_laplace.qmd b/src/functions-reference/embedded_laplace.qmd
@@ -4,23 +4,29 @@ pagetitle: Embedded Laplace Approximation
 
 # Embedded Laplace Approximation
 
-The embedded Laplace approximation can be used to approximate certain
-marginal and conditional distributions that arise in latent Gaussian models.
-A latent Gaussian model observes the following hierarchical structure:
+The embedded Laplace approximation can be used to approximate certain marginal
+and conditional distributions that arise in latent Gaussian models.
+Embedded Laplace replaces explicit sampling of high-dimensional Gaussian latent
+variables with a local Gaussian approximation.
+In doing so, it marginalizes out the latent Gaussian variables.
+Inference can then be performed on the remaining, often low-dimensional,
+parameters.  The embedded Laplace approximation in Stan is best suited for
+latent Gaussian models when jointly sampling over all model parameters is
+expensive and the conditional posterior of the Gaussian latent variables is
+reasonably close to Gaussian.
+
+For observed data $y$, latent Gaussian variables $\theta$, and hyperparameters $\phi$, a latent Gaussian model observes the following hierarchical structure:
 \begin{eqnarray}
   \phi &\sim& p(\phi), \\
   \theta &\sim& \text{MultiNormal}(0, K(\phi)), \\
   y &\sim& p(y \mid \theta, \phi).
 \end{eqnarray}
-In this formulation, $y$ represents the
-observed data, and $p(y \mid \theta, \phi)$ is the likelihood function that
-specifies how observations are generated conditional on the latent Gaussian
-variables $\theta$ and hyperparameters $\phi$.
+In this formulation, $p(y \mid \theta, \phi)$ is the likelihood function that
+specifies how observations are generated conditional on $\theta$ and $\phi$.
 $K(\phi)$ denotes the prior covariance matrix for the latent Gaussian variables
-$\theta$ and is parameterized by $\phi$.
-The prior $p(\theta \mid \phi)$ is restricted to be a multivariate normal
-centered at 0. That said, we can always pick a likelihood that offsets $\theta$,
-which is equivalently to specifying a prior mean.
+$\theta$ and is parameterized by $\phi$. The prior on $\theta$ is centered at 0,
+however an offset can always be added when specifying the likelihood function
+$p(y \mid \theta, \phi)$.
 
 To sample from the joint posterior $p(\phi, \theta \mid y)$, we can either
 use a standard method, such as Markov chain Monte Carlo, or we can follow
@@ -34,7 +40,7 @@ are typically available in closed form and so they must be approximated.
 The marginal posterior can be written as  $p(\phi \mid y) \propto p(y \mid \phi) p(\phi)$,
 where  $p(y \mid \phi) = \int p(y \mid \phi, \theta) p(\theta) \text{d}\theta$
 is called the marginal likelihood.  The Laplace method approximates
-$p(y \mid \phi, \theta) p(\theta)$ with a normal distribution centered at
+$p(y \mid \phi, \theta) p(\theta)$ with a normal distribution centered at the mode,
 $$
   \theta^* = \underset{\theta}{\text{argmax}} \ \log p(\theta \mid y, \phi),
 $$
@@ -53,7 +59,7 @@ using one of Stan's algorithms. The marginal posterior is lower
 dimensional and likely to have a simpler geometry leading to more
 efficient inference. On the other hand each marginal likelihood
 computation is more costly, and the combined change in efficiency
-depends on the case.
+depends on the application.
 
 To obtain posterior draws for $\theta$, we sample from the normal
 approximation to $p(\theta \mid y, \phi)$ in `generated quantities`.
@@ -62,7 +68,11 @@ then $p(\theta \mid y, \phi)$ produces samples from the joint posterior
 $p(\theta, \phi \mid y)$.
 
 The Laplace approximation is especially useful if $p(y \mid \phi, \theta)$ is
-log-concave. Stan's embedded Laplace approximation is restricted to the case
+log-concave, e.g., Poisson, binomial, negative-binomial, and Bernoulli.
+(The normal distribution is also log concave, however when the likelihood is
+normal, marginalization can be performed exactly and does not required an
+approximation.)
+Stan's embedded Laplace approximation is restricted to the case
 where the prior $p(\theta \mid \phi)$ is multivariate normal.
 Furthermore, the likelihood $p(y \mid \phi, \theta)$ must be computed using
 only operations which support higher-order derivatives
@@ -74,33 +84,36 @@ In the `model` block, we increment `target` with `laplace_marginal`, a function
 that approximates the log marginal likelihood $\log p(y \mid \phi)$.
 The signature of the function is:
 
-\index{{\tt \bfseries laplace\_marginal\_tol }!{\tt (function likelihood\_function, tuple(...) likelihood\_arguments, function covariance\_function, tuple(...), vector theta\_init covariance\_arguments): real}|hyperpage}
+\index{{\tt \bfseries laplace\_marginal }!{\tt (function likelihood\_function, tuple(...) likelihood\_arguments, int hessian_block_size, function covariance\_function, tuple(...)): real}|hyperpage}
 
-<!-- real; laplace_marginal; (function likelihood_function, tuple(...) likelihood_arguments, function covariance_function, tuple(...) covariance_arguments); -->
+`real` **`laplace_marginal`**`(function likelihood_function, tuple(...) likelihood_arguments, int hessian_block_size, function covariance_function, tuple(...) covariance_arguments)`
 
-`real` **`laplace_marginal`**`(function likelihood_function, tuple(...) likelihood_arguments, function covariance_function, tuple(...) covariance_arguments)`
-
-Which returns an approximation to the log marginal likelihood $p(y \mid \phi)$.
+which returns an approximation to the log marginal likelihood $p(y \mid \phi)$.
 {{< since 2.37 >}}
 
-This function takes in the following arguments.
+The embedded Laplace functions accept two functors whose user defined arguments are passed in as tuples to `laplace_marginal`.
 
-1. `likelihood_function` - user-specified log likelihood whose first argument is the vector of latent Gaussian variables `theta`
-2. `likelihood_arguments` - A tuple of the log likelihood arguments whose internal members will be passed to the covariance function
-3. `covariance_function` - Prior covariance function
-4. `covariance_arguments` A tuple of the arguments whose internal members will be passed to the the covariance function
+1. `likelihood_function` - user-specified log likelihood whose first argument is the vector of latent Gaussian variables $\theta$.
+The subsequent arguments are user defined.
+  - `real likelihood_function(vector theta, likelihood_arguments_1, likelihood_arguments_2, ...)`
+2. `likelihood_arguments` - A tuple of arguments whose internal members are be passed to the log likelihood function.
+This tuple does NOT include the latent variable $\theta$.
+3. `hessian_block_size` - the block size of the Hessian of the log likelihood, $\partial^2 \log p(y \mid \theta, \phi) / \partial \theta^2$.
+4. `covariance_function` - A function that returns the covariance matrix of the multivariate normal prior on $\theta$.
+  - `matrix covariance_function(covariance_argument_1, covariance_argument_2, ...)`
+5. `covariance_arguments` A tuple of the arguments whose internal members will be passed to the the covariance function.
 
 Below we go over each argument in more detail.
 
 ## Specifying the log likelihood function {#laplace-likelihood_spec}
 
 The first step to use the embedded Laplace approximation is to write down a
-function in the `functions` block which returns the log joint likelihood
+function in the `functions` block which returns the log likelihood
 $\log p(y \mid \theta, \phi)$.
 
 There are a few constraints on this function:
 
-1. The function return type must be `real`
+1. The function return type must be `real`.
 
 2. The first argument must be the latent Gaussian variable $\theta$ and must
 have type `vector`.
@@ -124,7 +137,7 @@ as data or parameter.
 
 The tuple after `likelihood_function` contains the arguments that get passed
 to `likelihood_function` *excluding $\theta$*. For instance, if a user defined
-likelihood uses a real and a matrix the likelihood function's signature would
+likelihood uses a real and a matrix, the likelihood function's signature would
 first have a vector and then a real and matrix argument.
 
 ```stan
@@ -149,6 +162,13 @@ for example,
 real likelihood_function(vector theta, data vector x, ...)
 ```
 
+In addition to the likelihood function, users must specify the block size
+of the Hessian, $\partial^2 \log p(y \mid \theta, \phi) / \partial \theta^2$.
+The Hessian is often block diagonal and this structure can be taken advantage of for fast computation.
+For example, if $y_i$ only depends on $\theta_i$, then the Hessian is diagonal and `hessian_block_size=1`.
+On the other hand, if the Hessian is not block diagonal, we can always set
+`hessian_block_size=n` where $n$ is the size of $\theta$.
+
 ## Specifying the covariance function
 
 The argument `covariance_function` returns the prior covariance matrix
@@ -159,20 +179,6 @@ It's return type must be a matrix of size $n \times n$ where $n$ is the size of
 matrix covariance_function(...)
 ```
 
-<!-- In the `model` block, we increment `target` with `laplace_marginal`, a function
-that approximates the log marginal likelihood $\log p(y \mid \phi)$.
-This function takes in the
-user-specified likelihood and prior covariance functions, as well as their arguments.
-These arguments must be passed as tuples, which can be generated on the fly
-using parenthesis.
-We also need to pass an argument $\theta_0$ which serves as an initial guess for
-the optimization problem that underlies the Laplace approximation,
-$$
-  \underset{\theta}{\text{argmax}} \ \log p(\theta \mid y, \phi).
-$$
-The size of $\theta_0$ must be consistent with the size of the $\theta$ argument
-passed to `likelihood_function`. -->
-
 The `...` represents a set of optional
 variadic arguments. There is no type restrictions for the variadic arguments
 `...` and each argument can be passed as data or parameter. The variables
@@ -198,51 +204,77 @@ It also possible to specify control parameters, which can help improve the
 optimization that underlies the Laplace approximation, using `laplace_marginal_tol`
 with the following signature:
 
-\index{{\tt \bfseries laplace\_marginal\_tol }!{\tt (function likelihood\_function, tuple(...), function covariance\_function, tuple(...), vector theta\_init, real tol, int max\_steps, int hessian\_block\_size, int solver, int max\_steps\_linesearch): real}|hyperpage}
-
-<!-- real; laplace_marginal_tol; (function likelihood_function, tuple(...), function covariance_function, tuple(...), vector theta_init, real tol, int max_steps, int hessian_block_size, int solver, int max_steps_linesearch); -->
-\index{{\tt \bfseries laplace\_marginal\_tol }!{\tt (function likelihood\_function, tuple(...), function covariance\_function, tuple(...), vector theta\_init, real tol, int max\_steps, int hessian\_block\_size, int solver, int max\_steps\_linesearch): real}|hyperpage}
-
-`real` **`laplace_marginal_tol`**`(function likelihood_function, tuple(...), function covariance_function, tuple(...), vector theta_init, real tol, int max_steps, int hessian_block_size, int solver, int max_steps_linesearch)`<br>\newline
+```stan
+real laplace_marginal_tol(function likelihood_function, tuple(...),
+  hessian_block_size,
+  function covariance_function, tuple(...),
+  tuple(vector theta_init, real tol, int max_steps, int solver,
+        int max_steps_linesearch, int allow_fallback))
+```
 
 Returns an approximation to the log marginal likelihood $p(y \mid \phi)$
 and allows the user to tune the control parameters of the approximation.
 
-* `theta_init`: the initial guess for the Newton solver when finding the mode
+* `theta_init`: the initial guess for a Newton solver when finding the mode
 of $p(\theta \mid y, \phi)$. By default, it is a zero-vector.
 
 * `tol`: the tolerance $\epsilon$ of the optimizer. Specifically, the optimizer
 stops when $||\nabla \log p(\theta \mid y, \phi)|| \le \epsilon$. By default,
-the value is $\epsilon = 10^{-6}$.
+the value is $\epsilon \approx 1.49 \times 10^{-8}$, which is the square-root of machine precision.
 
 * `max_num_steps`: the maximum number of steps taken by the optimizer before
 it gives up (in which case the Metropolis proposal gets rejected). The default
-is 100 steps.
+is 500 steps.
 
-* `hessian_block_size`: the size of the blocks, assuming the Hessian
-$\partial \log p(y \mid \theta, \phi) \ \partial \theta$ is block-diagonal.
-The structure of the Hessian is determined by the dependence structure of $y$
-on $\theta$. By default, the Hessian is treated as diagonal
-(`hessian_block_size=1`). If the Hessian is not block diagonal, then set
-`hessian_block_size=n`, where `n` is the size of $\theta$.
-
-* `solver`: choice of Newton solver. The optimizer used to compute the
+* `solver`: choice of Newton solver. The optimizer underlying the
 Laplace approximation does one of three matrix decompositions to compute a
-Newton step. The problem determines which decomposition is numerical stable.
-By default (`solver=1`), the solver makes a Cholesky decomposition of the
-negative Hessian, $- \partial \log p(y \mid \theta, \phi) / \partial \theta$.
-If `solver=2`, the solver makes a Cholesky decomposition of the covariance
-matrix $K(\phi)$.
-If the Cholesky decomposition cannot be computed for neither the negative
-Hessian nor the covariance matrix, use `solver=3` which uses a more expensive
-but less specialized approach.
+Newton step. The problem determines which decomposition is numerically stable.
+By default (`solver=1`), the solver attempts a Cholesky decomposition of the
+negative Hessian of the log likelihood,
+$- \partial^2 \log p(y \mid \theta, \phi) / \partial^2 \theta$.
+This operation is legal if the negative Hessian is positive-definite,
+which will always be true when the likelihood is log concave.
+If `solver=2`, the solver makes a Cholesky decomposition of the covariance matrix $K(\phi)$.
+Since a covariance matrix is always positive-definite, compute its
+Cholesky decomposition is always a legal operation, at least in theory.
+In practice, we may not be able to compute the Cholesky decomposition of the
+negative Hessian or the covariance matrix, either because it does not exist or
+because of numerical issues.
+In that case, we can use `solver=3` which uses a more expensive but less
+specialized approach to compute a Newton step.
 
 * `max_steps_linesearch`: maximum number of steps in linesearch. The linesearch
-method tries to insure that the Newton step leads to a decrease in the
-objective function. If the Newton step does not improve the objective function,
-the step is repeatedly halved until the objective function decreases or the
-maximum number of steps in the linesearch is reached. By default,
-`max_steps_linesearch=0`, meaning no linesearch is performed.
+adjusts to step size to ensure that a Newton step leads to an increase in
+the objective function (i.e., $f(\theta) = p(\theta \mid \phi, y)$).
+If a standard Newton step does not improve the objective function,
+the step is adjusted iteratively until the objective function increases
+or the maximum number of steps in the linesearch is reached.
+By default, `max_steps_linesearch=1000`.
+Setting `max_steps_linesearch=0` results in no linesearch.
+
+* `allow_fallback`: If user set solver fails, this flag determines whether to fallback to the next solver. For example, if the user specifies `solver=1` but the Cholesky decomposition of the negative Hessian $- \partial^2 \log p(y \mid \theta, \phi) / \partial^2 \theta$ fails, the optimizer will try `solver=2` instead.
+By default, `allow_fallback = 1` (TRUE).
+
+The embedded Laplace approximation's options have a helper callable `generate_laplace_options(int theta_size)` that will generate the tuple for you. This can be useful for quickly setting up the control parameters in the `transformed data` block to reuse within the model.
+
+```stan
+tuple(vector[theta_size], real, int, int, int, int, int) laplace_ops = generate_laplace_options(theta_size);
+// Modify solver type
+laplace_ops.5 = 2;
+// Turn off fallthrough
+laplace_ops.7 = 0;
+```
+
+The arguments stored in the `laplace_ops` tuple are,
+```
+laplace_ops = {theta_init,
+               tol,
+               max_num_steps,
+               hessian_block_size,
+               solver,
+               max_steps_linesearch,
+               allow_fallback}
+```
 
 {{< since 2.37 >}}
 
@@ -253,18 +285,18 @@ approximation of $p(\theta \mid \phi, y)$ using `laplace_latent_rng`.
 The signature for `laplace_latent_rng` follows closely
 the signature for `laplace_marginal`:
 
-<!-- vector; laplace_latent_rng; (function likelihood_function, tuple(...), function covariance_function, tuple(...)); -->
-\index{{\tt \bfseries laplace\_latent\_rng }!{\tt (function likelihood\_function, tuple(...), function covariance\_function, tuple(...), vector theta\_init): vector}|hyperpage}
+<!-- vector; laplace_latent_rng; (function likelihood_function, tuple(...), int hessian_block_size, function covariance_function, tuple(...)); -->
+\index{{\tt \bfseries laplace\_latent\_rng }!{\tt (function likelihood\_function, tuple(...), int hessian_block_size, function covariance\_function, tuple(...)): vector}|hyperpage}
 
-`vector` **`laplace_latent_rng`**`(function likelihood_function, tuple(...), function covariance_function, tuple(...))`<br>\newline
+`vector` **`laplace_latent_rng`**`(function likelihood_function, tuple(...), hessian_block_size, function covariance_function, tuple(...))`<br>\newline
 
-Draws approximate samples from the conditional posterior $p(\theta \mid y, \phi)$.
+Draws samples from the Laplace approximation to the conditional posterior $p(\theta \mid y, \phi)$.
 {{< since 2.37 >}}
 
 Once again, it is possible to specify control parameters:
-\index{{\tt \bfseries laplace\_latent\_tol\_rng }!{\tt (function likelihood\_function, tuple(...), function covariance\_function, tuple(...), vector theta\_init, real tol, int max\_steps, int hessian\_block\_size, int solver, int max\_steps\_linesearch): vector}|hyperpage}
+\index{{\tt \bfseries laplace\_latent\_tol\_rng }!{\tt (function likelihood\_function, tuple(...), int hessian_block_size, function covariance\_function, tuple(...), tuple(...) laplace_ops): vector}|hyperpage}
 
-`vector` **`laplace_latent_tol_rng`**`(function likelihood_function, tuple(...), function covariance_function, tuple(...), vector theta_init, real tol, int max_steps, int hessian_block_size, int solver, int max_steps_linesearch)`<br>\newline
+`vector` **`laplace_latent_tol_rng`**`(function likelihood_function, tuple(...), int hessian_block_size, function covariance_function, tuple(...), tuple(...) laplace_ops)`<br>\newline
 Draws approximate samples from the conditional posterior $p(\theta \mid y, \phi)$
 and allows the user to tune the control parameters of the approximation.
 {{< since 2.37 >}}