diff --git a/src/functions-reference/embedded_laplace.qmd b/src/functions-reference/embedded_laplace.qmd
index 63346dbfb..247d19b44 100644
--- a/src/functions-reference/embedded_laplace.qmd
+++ b/src/functions-reference/embedded_laplace.qmd
@@ -4,23 +4,29 @@ pagetitle: Embedded Laplace Approximation
# Embedded Laplace Approximation
-The embedded Laplace approximation can be used to approximate certain
-marginal and conditional distributions that arise in latent Gaussian models.
-A latent Gaussian model observes the following hierarchical structure:
+The embedded Laplace approximation can be used to approximate certain marginal
+and conditional distributions that arise in latent Gaussian models.
+Embedded Laplace replaces explicit sampling of high-dimensional Gaussian latent
+variables with a local Gaussian approximation.
+In doing so, it marginalizes out the latent Gaussian variables.
+Inference can then be performed on the remaining, often low-dimensional,
+parameters. The embedded Laplace approximation in Stan is best suited for
+latent Gaussian models when jointly sampling over all model parameters is
+expensive and the conditional posterior of the Gaussian latent variables is
+reasonably close to Gaussian.
+
+For observed data $y$, latent Gaussian variables $\theta$, and hyperparameters $\phi$, a latent Gaussian model observes the following hierarchical structure:
\begin{eqnarray}
\phi &\sim& p(\phi), \\
\theta &\sim& \text{MultiNormal}(0, K(\phi)), \\
y &\sim& p(y \mid \theta, \phi).
\end{eqnarray}
-In this formulation, $y$ represents the
-observed data, and $p(y \mid \theta, \phi)$ is the likelihood function that
-specifies how observations are generated conditional on the latent Gaussian
-variables $\theta$ and hyperparameters $\phi$.
+In this formulation, $p(y \mid \theta, \phi)$ is the likelihood function that
+specifies how observations are generated conditional on $\theta$ and $\phi$.
$K(\phi)$ denotes the prior covariance matrix for the latent Gaussian variables
-$\theta$ and is parameterized by $\phi$.
-The prior $p(\theta \mid \phi)$ is restricted to be a multivariate normal
-centered at 0. That said, we can always pick a likelihood that offsets $\theta$,
-which is equivalently to specifying a prior mean.
+$\theta$ and is parameterized by $\phi$. The prior on $\theta$ is centered at 0,
+however an offset can always be added when specifying the likelihood function
+$p(y \mid \theta, \phi)$.
To sample from the joint posterior $p(\phi, \theta \mid y)$, we can either
use a standard method, such as Markov chain Monte Carlo, or we can follow
@@ -34,7 +40,7 @@ are typically available in closed form and so they must be approximated.
The marginal posterior can be written as $p(\phi \mid y) \propto p(y \mid \phi) p(\phi)$,
where $p(y \mid \phi) = \int p(y \mid \phi, \theta) p(\theta) \text{d}\theta$
is called the marginal likelihood. The Laplace method approximates
-$p(y \mid \phi, \theta) p(\theta)$ with a normal distribution centered at
+$p(y \mid \phi, \theta) p(\theta)$ with a normal distribution centered at the mode,
$$
\theta^* = \underset{\theta}{\text{argmax}} \ \log p(\theta \mid y, \phi),
$$
@@ -53,7 +59,7 @@ using one of Stan's algorithms. The marginal posterior is lower
dimensional and likely to have a simpler geometry leading to more
efficient inference. On the other hand each marginal likelihood
computation is more costly, and the combined change in efficiency
-depends on the case.
+depends on the application.
To obtain posterior draws for $\theta$, we sample from the normal
approximation to $p(\theta \mid y, \phi)$ in `generated quantities`.
@@ -62,7 +68,11 @@ then $p(\theta \mid y, \phi)$ produces samples from the joint posterior
$p(\theta, \phi \mid y)$.
The Laplace approximation is especially useful if $p(y \mid \phi, \theta)$ is
-log-concave. Stan's embedded Laplace approximation is restricted to the case
+log-concave, e.g., Poisson, binomial, negative-binomial, and Bernoulli.
+(The normal distribution is also log concave, however when the likelihood is
+normal, marginalization can be performed exactly and does not required an
+approximation.)
+Stan's embedded Laplace approximation is restricted to the case
where the prior $p(\theta \mid \phi)$ is multivariate normal.
Furthermore, the likelihood $p(y \mid \phi, \theta)$ must be computed using
only operations which support higher-order derivatives
@@ -74,33 +84,36 @@ In the `model` block, we increment `target` with `laplace_marginal`, a function
that approximates the log marginal likelihood $\log p(y \mid \phi)$.
The signature of the function is:
-\index{{\tt \bfseries laplace\_marginal\_tol }!{\tt (function likelihood\_function, tuple(...) likelihood\_arguments, function covariance\_function, tuple(...), vector theta\_init covariance\_arguments): real}|hyperpage}
+\index{{\tt \bfseries laplace\_marginal }!{\tt (function likelihood\_function, tuple(...) likelihood\_arguments, int hessian_block_size, function covariance\_function, tuple(...)): real}|hyperpage}
-
+`real` **`laplace_marginal`**`(function likelihood_function, tuple(...) likelihood_arguments, int hessian_block_size, function covariance_function, tuple(...) covariance_arguments)`
-`real` **`laplace_marginal`**`(function likelihood_function, tuple(...) likelihood_arguments, function covariance_function, tuple(...) covariance_arguments)`
-
-Which returns an approximation to the log marginal likelihood $p(y \mid \phi)$.
+which returns an approximation to the log marginal likelihood $p(y \mid \phi)$.
{{< since 2.37 >}}
-This function takes in the following arguments.
+The embedded Laplace functions accept two functors whose user defined arguments are passed in as tuples to `laplace_marginal`.
-1. `likelihood_function` - user-specified log likelihood whose first argument is the vector of latent Gaussian variables `theta`
-2. `likelihood_arguments` - A tuple of the log likelihood arguments whose internal members will be passed to the covariance function
-3. `covariance_function` - Prior covariance function
-4. `covariance_arguments` A tuple of the arguments whose internal members will be passed to the the covariance function
+1. `likelihood_function` - user-specified log likelihood whose first argument is the vector of latent Gaussian variables $\theta$.
+The subsequent arguments are user defined.
+ - `real likelihood_function(vector theta, likelihood_arguments_1, likelihood_arguments_2, ...)`
+2. `likelihood_arguments` - A tuple of arguments whose internal members are be passed to the log likelihood function.
+This tuple does NOT include the latent variable $\theta$.
+3. `hessian_block_size` - the block size of the Hessian of the log likelihood, $\partial^2 \log p(y \mid \theta, \phi) / \partial \theta^2$.
+4. `covariance_function` - A function that returns the covariance matrix of the multivariate normal prior on $\theta$.
+ - `matrix covariance_function(covariance_argument_1, covariance_argument_2, ...)`
+5. `covariance_arguments` A tuple of the arguments whose internal members will be passed to the the covariance function.
Below we go over each argument in more detail.
## Specifying the log likelihood function {#laplace-likelihood_spec}
The first step to use the embedded Laplace approximation is to write down a
-function in the `functions` block which returns the log joint likelihood
+function in the `functions` block which returns the log likelihood
$\log p(y \mid \theta, \phi)$.
There are a few constraints on this function:
-1. The function return type must be `real`
+1. The function return type must be `real`.
2. The first argument must be the latent Gaussian variable $\theta$ and must
have type `vector`.
@@ -124,7 +137,7 @@ as data or parameter.
The tuple after `likelihood_function` contains the arguments that get passed
to `likelihood_function` *excluding $\theta$*. For instance, if a user defined
-likelihood uses a real and a matrix the likelihood function's signature would
+likelihood uses a real and a matrix, the likelihood function's signature would
first have a vector and then a real and matrix argument.
```stan
@@ -149,6 +162,13 @@ for example,
real likelihood_function(vector theta, data vector x, ...)
```
+In addition to the likelihood function, users must specify the block size
+of the Hessian, $\partial^2 \log p(y \mid \theta, \phi) / \partial \theta^2$.
+The Hessian is often block diagonal and this structure can be taken advantage of for fast computation.
+For example, if $y_i$ only depends on $\theta_i$, then the Hessian is diagonal and `hessian_block_size=1`.
+On the other hand, if the Hessian is not block diagonal, we can always set
+`hessian_block_size=n` where $n$ is the size of $\theta$.
+
## Specifying the covariance function
The argument `covariance_function` returns the prior covariance matrix
@@ -159,20 +179,6 @@ It's return type must be a matrix of size $n \times n$ where $n$ is the size of
matrix covariance_function(...)
```
-
-
The `...` represents a set of optional
variadic arguments. There is no type restrictions for the variadic arguments
`...` and each argument can be passed as data or parameter. The variables
@@ -198,51 +204,77 @@ It also possible to specify control parameters, which can help improve the
optimization that underlies the Laplace approximation, using `laplace_marginal_tol`
with the following signature:
-\index{{\tt \bfseries laplace\_marginal\_tol }!{\tt (function likelihood\_function, tuple(...), function covariance\_function, tuple(...), vector theta\_init, real tol, int max\_steps, int hessian\_block\_size, int solver, int max\_steps\_linesearch): real}|hyperpage}
-
-
-\index{{\tt \bfseries laplace\_marginal\_tol }!{\tt (function likelihood\_function, tuple(...), function covariance\_function, tuple(...), vector theta\_init, real tol, int max\_steps, int hessian\_block\_size, int solver, int max\_steps\_linesearch): real}|hyperpage}
-
-`real` **`laplace_marginal_tol`**`(function likelihood_function, tuple(...), function covariance_function, tuple(...), vector theta_init, real tol, int max_steps, int hessian_block_size, int solver, int max_steps_linesearch)`
\newline
+```stan
+real laplace_marginal_tol(function likelihood_function, tuple(...),
+ hessian_block_size,
+ function covariance_function, tuple(...),
+ tuple(vector theta_init, real tol, int max_steps, int solver,
+ int max_steps_linesearch, int allow_fallback))
+```
Returns an approximation to the log marginal likelihood $p(y \mid \phi)$
and allows the user to tune the control parameters of the approximation.
-* `theta_init`: the initial guess for the Newton solver when finding the mode
+* `theta_init`: the initial guess for a Newton solver when finding the mode
of $p(\theta \mid y, \phi)$. By default, it is a zero-vector.
* `tol`: the tolerance $\epsilon$ of the optimizer. Specifically, the optimizer
stops when $||\nabla \log p(\theta \mid y, \phi)|| \le \epsilon$. By default,
-the value is $\epsilon = 10^{-6}$.
+the value is $\epsilon \approx 1.49 \times 10^{-8}$, which is the square-root of machine precision.
* `max_num_steps`: the maximum number of steps taken by the optimizer before
it gives up (in which case the Metropolis proposal gets rejected). The default
-is 100 steps.
+is 500 steps.
-* `hessian_block_size`: the size of the blocks, assuming the Hessian
-$\partial \log p(y \mid \theta, \phi) \ \partial \theta$ is block-diagonal.
-The structure of the Hessian is determined by the dependence structure of $y$
-on $\theta$. By default, the Hessian is treated as diagonal
-(`hessian_block_size=1`). If the Hessian is not block diagonal, then set
-`hessian_block_size=n`, where `n` is the size of $\theta$.
-
-* `solver`: choice of Newton solver. The optimizer used to compute the
+* `solver`: choice of Newton solver. The optimizer underlying the
Laplace approximation does one of three matrix decompositions to compute a
-Newton step. The problem determines which decomposition is numerical stable.
-By default (`solver=1`), the solver makes a Cholesky decomposition of the
-negative Hessian, $- \partial \log p(y \mid \theta, \phi) / \partial \theta$.
-If `solver=2`, the solver makes a Cholesky decomposition of the covariance
-matrix $K(\phi)$.
-If the Cholesky decomposition cannot be computed for neither the negative
-Hessian nor the covariance matrix, use `solver=3` which uses a more expensive
-but less specialized approach.
+Newton step. The problem determines which decomposition is numerically stable.
+By default (`solver=1`), the solver attempts a Cholesky decomposition of the
+negative Hessian of the log likelihood,
+$- \partial^2 \log p(y \mid \theta, \phi) / \partial^2 \theta$.
+This operation is legal if the negative Hessian is positive-definite,
+which will always be true when the likelihood is log concave.
+If `solver=2`, the solver makes a Cholesky decomposition of the covariance matrix $K(\phi)$.
+Since a covariance matrix is always positive-definite, compute its
+Cholesky decomposition is always a legal operation, at least in theory.
+In practice, we may not be able to compute the Cholesky decomposition of the
+negative Hessian or the covariance matrix, either because it does not exist or
+because of numerical issues.
+In that case, we can use `solver=3` which uses a more expensive but less
+specialized approach to compute a Newton step.
* `max_steps_linesearch`: maximum number of steps in linesearch. The linesearch
-method tries to insure that the Newton step leads to a decrease in the
-objective function. If the Newton step does not improve the objective function,
-the step is repeatedly halved until the objective function decreases or the
-maximum number of steps in the linesearch is reached. By default,
-`max_steps_linesearch=0`, meaning no linesearch is performed.
+adjusts to step size to ensure that a Newton step leads to an increase in
+the objective function (i.e., $f(\theta) = p(\theta \mid \phi, y)$).
+If a standard Newton step does not improve the objective function,
+the step is adjusted iteratively until the objective function increases
+or the maximum number of steps in the linesearch is reached.
+By default, `max_steps_linesearch=1000`.
+Setting `max_steps_linesearch=0` results in no linesearch.
+
+* `allow_fallback`: If user set solver fails, this flag determines whether to fallback to the next solver. For example, if the user specifies `solver=1` but the Cholesky decomposition of the negative Hessian $- \partial^2 \log p(y \mid \theta, \phi) / \partial^2 \theta$ fails, the optimizer will try `solver=2` instead.
+By default, `allow_fallback = 1` (TRUE).
+
+The embedded Laplace approximation's options have a helper callable `generate_laplace_options(int theta_size)` that will generate the tuple for you. This can be useful for quickly setting up the control parameters in the `transformed data` block to reuse within the model.
+
+```stan
+tuple(vector[theta_size], real, int, int, int, int, int) laplace_ops = generate_laplace_options(theta_size);
+// Modify solver type
+laplace_ops.5 = 2;
+// Turn off fallthrough
+laplace_ops.7 = 0;
+```
+
+The arguments stored in the `laplace_ops` tuple are,
+```
+laplace_ops = {theta_init,
+ tol,
+ max_num_steps,
+ hessian_block_size,
+ solver,
+ max_steps_linesearch,
+ allow_fallback}
+```
{{< since 2.37 >}}
@@ -253,18 +285,18 @@ approximation of $p(\theta \mid \phi, y)$ using `laplace_latent_rng`.
The signature for `laplace_latent_rng` follows closely
the signature for `laplace_marginal`:
-
-\index{{\tt \bfseries laplace\_latent\_rng }!{\tt (function likelihood\_function, tuple(...), function covariance\_function, tuple(...), vector theta\_init): vector}|hyperpage}
+
+\index{{\tt \bfseries laplace\_latent\_rng }!{\tt (function likelihood\_function, tuple(...), int hessian_block_size, function covariance\_function, tuple(...)): vector}|hyperpage}
-`vector` **`laplace_latent_rng`**`(function likelihood_function, tuple(...), function covariance_function, tuple(...))`
\newline
+`vector` **`laplace_latent_rng`**`(function likelihood_function, tuple(...), hessian_block_size, function covariance_function, tuple(...))`
\newline
-Draws approximate samples from the conditional posterior $p(\theta \mid y, \phi)$.
+Draws samples from the Laplace approximation to the conditional posterior $p(\theta \mid y, \phi)$.
{{< since 2.37 >}}
Once again, it is possible to specify control parameters:
-\index{{\tt \bfseries laplace\_latent\_tol\_rng }!{\tt (function likelihood\_function, tuple(...), function covariance\_function, tuple(...), vector theta\_init, real tol, int max\_steps, int hessian\_block\_size, int solver, int max\_steps\_linesearch): vector}|hyperpage}
+\index{{\tt \bfseries laplace\_latent\_tol\_rng }!{\tt (function likelihood\_function, tuple(...), int hessian_block_size, function covariance\_function, tuple(...), tuple(...) laplace_ops): vector}|hyperpage}
-`vector` **`laplace_latent_tol_rng`**`(function likelihood_function, tuple(...), function covariance_function, tuple(...), vector theta_init, real tol, int max_steps, int hessian_block_size, int solver, int max_steps_linesearch)`
\newline
+`vector` **`laplace_latent_tol_rng`**`(function likelihood_function, tuple(...), int hessian_block_size, function covariance_function, tuple(...), tuple(...) laplace_ops)`
\newline
Draws approximate samples from the conditional posterior $p(\theta \mid y, \phi)$
and allows the user to tune the control parameters of the approximation.
{{< since 2.37 >}}
diff --git a/src/functions-reference/functions_index.qmd b/src/functions-reference/functions_index.qmd
index 870df9d52..ad18c3d42 100644
--- a/src/functions-reference/functions_index.qmd
+++ b/src/functions-reference/functions_index.qmd
@@ -1713,7 +1713,7 @@ pagetitle: Alphabetical Index
**laplace_latent_rng**:
- -