diff --git a/src/_quarto.yml b/src/_quarto.yml
index 05e773c6c..0b02f9264 100644
--- a/src/_quarto.yml
+++ b/src/_quarto.yml
@@ -154,6 +154,7 @@ website:
             - stan-users-guide/poststratification.qmd
             - stan-users-guide/decision-analysis.qmd
             - stan-users-guide/bootstrap.qmd
+            - stan-users-guide/multiple-imputation.qmd
         - section: "Appendices"
           contents:
             - stan-users-guide/using-stanc.qmd
diff --git a/src/bibtex/all.bib b/src/bibtex/all.bib
index 02b7c034b..86735af1e 100644
--- a/src/bibtex/all.bib
+++ b/src/bibtex/all.bib
@@ -1352,3 +1352,35 @@ @article{Henrich2024
   url={https://doi.org/10.3758/s13428-023-02179-1}
 }
 
+@book{carpenter-etal:2023,
+  title = {Multiple Imputation and its Application},
+  author = {Carpenter, James R. and Bartlett, Jonathan W. and
+            Morris, Tim P. and Wood, Angela M. and Quartagno, Matteo 
+            and Kenward, Michael G.},
+  year = {2023},
+  edition = {2nd},
+  publisher = {John Wiley & Sons, Ltd},
+  isbn = {9781119756118},
+  doi = {10.1002/9781119756118},
+}
+
+@book{vanbuuren:2018,
+  author    = {{van Buuren}, Stef},
+  title     = {Flexible Imputation of Missing Data},
+  edition   = {2nd},
+  publisher = {Chapman and Hall/CRC},
+  year      = {2018},
+  doi       = {10.1201/9780429492259},
+  url       = {https://doi.org/10.1201/9780429492259}
+}
+
+@article{plummer:2015,
+  author    = {Plummer, Martyn},
+  title     = {Cuts in Bayesian graphical models},
+  journal   = {Statistics and Computing},
+  volume    = {25},
+  pages     = {37--43},
+  year      = {2015},
+  doi       = {10.1007/s11222-014-9503-z},
+  url       = {https://doi.org/10.1007/s11222-014-9503-z}
+}
\ No newline at end of file
diff --git a/src/stan-users-guide/_quarto.yml b/src/stan-users-guide/_quarto.yml
index f8b25f9d8..9f90c971d 100644
--- a/src/stan-users-guide/_quarto.yml
+++ b/src/stan-users-guide/_quarto.yml
@@ -75,6 +75,7 @@ book:
         - poststratification.qmd
         - decision-analysis.qmd
         - bootstrap.qmd
+        - multiple-imputation.qmd
     - part: "Appendices"
       chapters:
         - using-stanc.qmd
diff --git a/src/stan-users-guide/multiple-imputation.qmd b/src/stan-users-guide/multiple-imputation.qmd
new file mode 100644
index 000000000..d4601e12d
--- /dev/null
+++ b/src/stan-users-guide/multiple-imputation.qmd
@@ -0,0 +1,381 @@
+---
+pagetitle: Multiple Imputation
+---
+
+# Multiple imputation
+
+Missing data is common in applied data analysis. Ignoring this 
+missingness can distort posterior inferences and reduce their
+precision. Modeling the missing data can improve inference quality. Chapter 18 in @GelmanEtAl:2013 offers a Bayesian perspective 
+for analyzing data with missing values.
+
+
+Multiple imputation replaces missing values with different estimates 
+to obtain different versions of a complete data set. Then, the model 
+of interest can be fitted to each of these complete versions 
+separately. Combining the resulting draws from these different fits 
+produces a sample that accounts for the uncertainty in the missing values.
+
+## Motivation and outline
+
+Suppose that we have a matrix $x$ with columns
+$x_{\cdot, 1}, \ldots, x_{\cdot, K}$ that we want to use to sample 
+values from a vector of quantities of interest called $\theta$. With
+the completely observed data, $x^\text{comp}$, we could estimate the 
+posterior distribution of $\theta$ as $p(\theta \mid x^\text{comp})$.
+But with missing data, our matrix is split into $x^{\text{obs}}$ (the
+observed values of $x$) and $x^{\text{mis}}$ (the missing  values of
+$x$). This can be problematic because, in general, our knowledge may 
+apply directly only to $p(\theta \mid x^\text{comp})$.
+
+Fortunately, we can treat $x^{\text{mis}}$ as additional, nuisance 
+parameters that we can estimate along with $\theta$ as
+$p(\theta, x^{\text{mis}} \mid x^{\text{obs}})$. So, we can express 
+the marginal distribution of $\theta$ given only the observed values 
+$x^{\text{obs}}$ as
+\begin{align*}
+p(\theta \mid x^{\text{obs}}) =&
+\int 
+    p(\theta, y^{\text{mis}} \mid x^{\text{obs}}) 
+    \mathrm{d}y^{\text{mis}} \\
+=&
+\int 
+    p(\theta \mid x^{\text{obs}}, y^{\text{mis}}) 
+    p(y^{\text{mis}} \mid x^{\text{obs}})
+    \mathrm{d}y^{\text{mis}} \\
+=& \int 
+    p(\theta \mid x^{\text{imp}}) 
+    p(y^{\text{mis}} \mid x^{\text{obs}})
+    \mathrm{d}y^{\text{mis}},
+\end{align*}
+where $x^{\text{imp}}$ is a data set that has been completed with
+imputed values of $x^{\text{mis}}$.
+
+The equations above show that we do not need to describe
+$p(\theta \mid x^{\text{obs}})$ directly. Instead, we can first
+find a way to estimate $x^{\text{mis}}$ based on $x^{\text{obs}}$,
+and then use these estimates to fit the model
+$p(\theta \mid x^\text{imp})$, treating $x^\text{imp}$ as if it was 
+$x^\text{comp}$. Thus, the general outline for multiple imputation 
+with a number $M$ of imputations is:
+
+1. Draw $M$ random samples $x^{\text{mis}}^1, \ldots,
+x^{\text{mis}}^M$ from the posterior predictive distribution
+$p(x^{\text{mis}} \mid x^{\text{obs}})$.
+2. Use these samples to get $M$ different complete data sets
+$x_1^\text{imp}, \ldots, x_M^\text{imp}$.
+3. For each of the $M$ imputed data sets, sample from the model of 
+interest $p(\theta \mid x_m^\text{imp})$, each time treating
+$x_m^\text{imp}$ as if it was $x^\text{comp}$.
+4. Combine the draws for $\theta$ from all $M$ fits.
+
+### Pseudocode for outline
+
+Following the notation above, we can express the outline with the
+pseudocode below:
+```stan
+theta_estimates = [] // Prepare to store draws from imputed data sets
+for (m in 1:M) {
+  y_mis <- get_imputations(x_obs) // Draw random values for y_mis
+  x_imp <- add_imputations(x, y_mis) // Complete data with imputations 
+  imputation_fits <- my_model.fit(x_imp).draws() // Get estimates
+  theta_estimates <- theta_estimates.append(imputation_fits)
+}
+```
+Each of the functions in this pseudocode can encapsulate simple or
+complex procedures. `get_imputations()`, for example, may impute
+missing data by taking values directly from `x_obs`; or it may fit a 
+regression model where a variable with missing values is the outcome
+and the variables with fully observed values are used as predictors.
+
+## Imputing one variable in Stan
+
+Imagine our data set is composed of two numerical, continuous 
+variables $x$ and $y$, and one dichotomic variable $z$. We want to 
+estimate the parameters that govern the conditional association
+$p(y \mid x, z)$, but several values of $x$ are missing. So, before
+we fit this model for $y$, we impute the missing values in $x$.
+
+We first find all the fully observed data points and identify their 
+values as $x^{\text{obs}}$, $y^{\text{obs}}$, and $z^{\text{obs}}$.
+Then we use these data to fit the model
+$$
+x^{\text{obs}} \sim \text{Normal}(\gamma_0 + \gamma_1 y^{\text{obs}} +
+    \gamma_2 z^{\text{obs}}, \lambda).
+$$
+
+This model can then give us estimates for the missing values
+$x^{\text{mis}}$ conditional on the corresponding values of $y$ and 
+$z$, which we call $y^{\text{aux}}$ and $z^{\text{aux}}$. The 
+Stan code for this imputation is:
+```stan
+// MODEL TO IMPUTE x
+data {
+  int<lower=0> N_obs;   // number of fully-observed x's
+  vector[N_obs] x_obs;  // observed values of x
+  vector[N_obs] y_obs;  // values of y that correspond to x_obs
+  array[N_obs] int<lower=0, upper=1> z_obs;  // values of z that
+                                             // correspond to x_obs
+  int<lower=0> N_mis;   // number of rows with missing x's
+  vector[N_mis] y_aux;  // y corresponding to the missing x's
+  array[N_obs] int<lower=0, upper=1> z_aux;  // z corresponding to
+                                             // the missing x's
+}
+parameters {
+  real gamma_0;         // intercept
+  real gamma_1;         // coefficient for y_obs
+  real gamma_2;         // coefficient for z_obs
+  real<lower=0> lambda;  // standard deviation for x_obs
+}
+model {
+  // Likelihood
+  x_obs ~ normal(gamma_0 + gamma_1 * y_obs + gamma_2 * z_obs, lambda);
+}
+generated quantities {
+  vector[N_mis] x_imp; // Random draws to impute missing x's
+  for (n in 1:N_mis) {
+    x_imp[n] = normal_rng(gamma_0 + gamma_1 * y_aux[n] 
+                + gamma_2 * z_aux[n], lambda);
+  }
+}
+```
+The `generated quantities` block automatically samples $\gamma_0$,
+$\gamma_1$, $\gamma_2$, and $\lambda$ from their posterior 
+distributions. So, the random draws of `x_imp` incorporate 
+uncertainty from the estimated parameters and from the sampling 
+variation in $x^{\text{mis}}$.
+
+Multiple posterior draws of $x^{\text{mis}}$ give us multiple 
+completed data sets. With these data sets we can model
+$p(y \mid x, z)$ as
+$$
+y \sim \text{Normal}(\beta_0 + \beta_1 x' + \beta_2 z, \sigma),
+$$
+where $x'$ contains observed and imputed values. The Stan code for
+this model is:
+```stan
+// MODEL FOR y 
+data {
+  int<lower=0> N;              // number of observations
+  vector[N] y;                 // response variable
+  vector[N] x;                 // predictor variable x
+  vector[N] z;                 // predictor variable z
+}
+parameters {
+  real beta_0;                 // intercept
+  real beta_1;                 // coefficient for x
+  real beta_2;                 // coefficient for z
+  real<lower=0> sigma;         // error standard deviation
+}
+model {
+  // Likelihood
+  y ~ normal(beta_0 + beta_1 * x + beta_2 * z, sigma);
+}
+```
+
+With multiple imputation, the model for $p(y \mid x, z)$ does not
+need to distinguish between observed and imputed values of $x$.
+Instead, it can treat all imputed data sets as complete because we can combine all the posterior draws from multiple fits.
+
+## Imputing two or more variables
+
+A more general scenario involves an outcome variable $y$ and several
+explanatory variables $x_1, \ldots, x_K$, each of which can have
+missing values that we want to impute.
+
+One solution is to do multiple imputation through chained equations, 
+often called "MICE"[^mice]. The MICE procedure in this scenario is:
+
+[^mice]: Section 5 of chapter 4 in @vanbuuren:2018 details the
+MICE procedure in a frequentist context.
+
+0. **Initialize missing values in all $x_i$**. For each variable 
+$x_i$ with missing entries, fill its missing values with random 
+samples from its observed values (or another simple initialization 
+rule). 
+1. **Update each $x_i$ given $y$ and the other $x$’s**.
+For $i=1, \ldots, K$, fit a model for the *observed* $x_i$
+conditional on the current versions (with observed and imputed 
+values) of $y$ and $x_{-i} = \{x_j: j \neq i\}$. Use this model to 
+draw impuations from the predictive distribution of the *missing* 
+$x_i$. Completing this step for all $i=1, \ldots, K$ constitutes a 
+single imputation cycle.
+2. **Burn-in period**. Repeat the imputation cycle in step 1 several 
+times as burn-in to let the imputations stabilize.
+3. **Create M completed datasets**. After burn-in, record the current 
+completed dataset as one imputed dataset. Then either: *restart* from 
+step 0 and repeat steps 1–3 until you have $M$ completed datasets; or
+do a *single long run*, i.e., continue iterating steps 1–2 and save 
+the completed dataset at $M$ well-spaced iterations (e.g., every $S$ 
+cycles) to obtain $M$ completed datasets, without restarting from 
+step 0 each time.
+4. **Fit the target model** $p(y|x_1, \ldots, x_K)$ separately to
+each of the $M$ completed datasets and save the posterior draws for 
+the parameters of interest.
+5. **Combine the draws** (or other inferential summaries) across the $M$ fits.
+
+The random samples from step 0 kickstart the imputation cycle but we
+do not use them to fit $p(y|x_1, \ldots, x_K)$. This kickstart is 
+especially useful when there are few complete data points in the 
+original data.
+
+### Imputing two variables in Stan
+
+Imagine that we again want to use numerical variables $x$ and $y$, 
+and dichotomic variable $z$ to estimate the parameters in
+$p(y \mid x, z)$. But now both $x$ and $z$ have missing
+values that we want to impute.
+
+To do MICE in this example, we can reuse the models for $x$ and $y$
+that we defined above. We also need a new model to impute
+$z^{\text{mis}}$. We use the logistic regression
+$$
+z^{\text{obs}} \sim \text{Bernoulli}(
+  \text{logit}^{-1}(
+    \alpha_0 + \alpha_1 y'^{\text{obs}} + \alpha_2 x'^{\text{obs}})
+  ).
+$$
+Here, $z^{\text{obs}}$ contains only the completely observed
+values from our original data set, while $x'^{\text{obs}}$ and 
+$y'^{\text{obs}}$ contain all the values that correspond to
+$z^{\text{obs}}$. Thus, $x'^{\text{obs}}$ can include observed and
+imputed values, and $y'^{\text{obs}}$ need not be the same as in the 
+model for $x^{\text{obs}}$.
+
+The Stan code to impute $z^{\text{mis}}$ is:
+```stan
+// MODEL TO IMPUTE z (binary)
+data {
+  int<lower=0> N_obs;       // number of fully-observed z's
+  array[N_obs] int<lower=0, upper=1> z_obs;  // observed values of z 
+  vector[N_obs] y_obs;      // values of y that correspond to z_obs
+  vector[N_obs] x_obs;      // values of x that correspond to z_obs
+  int<lower=0> N_mis;       // number of rows with missing z's
+  vector[N_mis] y_aux;      // y corresponding to the missing z's
+  vector[N_mis] x_aux;      // x corresponding to the missing z's
+}
+parameters {
+  real gamma_0;             // intercept
+  real gamma_1;             // coefficient for y_obs
+  real gamma_2;             // coefficient for x_obs
+}
+model {
+  // Likelihood (logistic regression)
+  z_obs ~ bernoulli_logit(gamma_0 + gamma_1 * y_obs 
+                          + gamma_2 * x_obs);
+}
+generated quantities {
+  // Random draws to impute missing z's
+  array[N_mis] int<lower=0, upper=1> z_imp; 
+  for (n in 1:N_mis) {
+    z_imp[n] = bernoulli_logit_rng(gamma_0 + gamma_1 * y_aux[n] 
+                                   + gamma_2 * x_aux[n]);
+  }
+}
+```
+
+The pseudocode for this example of MICE, restarting the imputations
+after each imputation cycle, is shown below. Note that we do not need 
+to initialize $x^{\text{mis}}$ in step 0 because we can impute it 
+directly with the model for $x^{\text{obs}}$.
+```stan
+completed_datasets <- []
+FOR m = 1 TO M:
+  // Step 0: Initialize missing z
+  data_m <- copy(data_orig)
+  data_m.z[missing_idz] <- random_sample(data_m.z[observed_idz])
+  // Begin burn-in imputation cycles
+  FOR cycle = 1 TO n_burn:
+    // Step 1: imputation cycle
+    // Update x given y, z
+    stanmod_x <- build_model("model_for_x.stan")
+    fit_x <- stanmod_x.fit(
+      obs_data=data_m[observed_idx],
+      mis_data=data_m[missing_idx]
+    )
+    data_m.x[missing_idx] <- get_imputations(fit_x, "x_imp")
+    // NOTE: get_imputations() takes ONE random draw for "x_imp" from
+    // the generated quantities block in the model fit.
+    // Update z given y, x (fit on observed z)
+    stanmod_z <- build_model("model_for_z.stan")
+    fit_z <- stanmod_z.fit(
+      obs_data=data_m[observed_idz],
+      mis_data=data_m[missing_idz]
+    )
+    data_m.z[missing_idz] <- get_imputations(fit_z, "z_imp")
+  END FOR
+  // Step 3 (Restart option): Save the latest completed dataset,  
+  // then restart for next imputation
+  completed_datasets.append(data_m)
+END FOR
+
+all_draws <- []
+FOR dataset IN completed_datasets:
+  // Step 4: Fit target model to each completed dataset
+  stanmod_y <- build_model("model_for_y.stan")
+  fit_y <- stanmod_y.fit(dataset)   // fits p(y | x, z)
+  all_draws.append(extract_draws(model_results))
+END FOR
+// Step 5: Combine posterior draws across imputations
+all_results <- combine_results(all_draws)
+```
+
+## Combining posterior draws
+
+With Stan's MCMC sampler, we can treat posterior draw chains from
+imputed data sets as if they were chains based on complete data.
+There is one important difference: multiple imputation expresses
+uncertainty in the missing values as consistent differences in the
+estimates obtained from different imputated data
+sets. This means that chains obtained from the same imputed data set 
+should converge, but chains obtained from different data sets do not
+have to. So, we need not worry if diagnostics[^rhat] signal that the
+chains from different data sets are not converging properly.
+
+<!-- there is no need to use any special pooling 
+rules[^rubin] to account for the uncertainty in the imputation.  -->
+
+[^rubin]: Chapter 3, page 45 of @carpenter-etal:2023 summarizes one
+such set of pooling rules.
+
+
+
+[^rhat]: Such as $\hat{R}$. See [*Split R-hat for detecting non-stationarity*](https://mc-stan.org/docs/reference-manual/analysis.html#split-r-hat-for-detecting-non-stationarity) 
+in the *Reference Manual*.
+
+## Cut models
+
+[**NOTE**: I would greatly appreciate any comments or changes to
+improve this subsection.]
+
+A full bayesian probability model includes a feedback flow of
+information between all parameters and all data. Cut models separate 
+some parts of this feedback flow so that different subsets of data 
+influence only some parameters in the model. From @plummer:2015,
+p. 37:
+
+> Cut models arise in applications with multiple data sources that
+provide information about different parameters in the model [...] 
+Interest lies in parameters $\theta$, which are informed by data $x$. 
+The likelihood for $\theta$ includes nuisance parameters $\phi$ that 
+are estimated using auxiliary data $Z$. In a full probability model,
+information from $x$ "feeds back" [...] to influence the posterior 
+distribution of $\phi$. There are circumstances in which
+this feedback may be considered unhelpful [...]
+
+Multiple imputation interrupts the flow of information from data to
+parameters. In our scenario above, for examlpe, the imputations
+influence the distribution of the parameters in
+$p(y \mid x_1, \ldots, x_K)$, but these parameters do not influence
+the imputations. So, we could use multiple imputation to implement a
+cut model.
+
+Following Plummer's notation above, the outline for a cut model with 
+multiple imputation is
+
+1. Draw $M$ random samples $\phi^1, \ldots, \phi^M$ from a model
+$p(\phi \mid Z)$.
+2. To each of the $M$ samples, fit a model $p(\theta | x, \phi^j)$. 
+Each time, treat $\phi^j$ each time as if it was a fixed value, and 
+get multiple draws for $\theta$ from each $\phi^j$.
+3. Combine the draws for $\theta$ from all $M$ fits.