Skip to content

Add chapter on multiple imputation.#935

Open
abner-hb wants to merge 8 commits intostan-dev:masterfrom
abner-hb:master
Open

Add chapter on multiple imputation.#935
abner-hb wants to merge 8 commits intostan-dev:masterfrom
abner-hb:master

Conversation

@abner-hb
Copy link

@abner-hb abner-hb commented Mar 12, 2026

Submission Checklist

  • Builds locally YES
  • New functions marked with <<{ since VERSION }>> YES (no new functions)
  • Declare copyright holder and open-source license: see below

Summary

Add a chapter on multiple imputation to the Stan User's Guide.

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Abner Heredia Bustos

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

@WardBrian
Copy link
Member

Hi @abner-hb, thanks for this! I just made a couple commits to remove the changes to the built docs -- we let our Jenkins jobs build those for us during releases.

I'll ask @bob-carpenter to take a look at the contents when he gets a chance

@WardBrian WardBrian requested a review from bob-carpenter March 12, 2026 15:35
@bob-carpenter
Copy link
Member

I can review this.

Copy link
Member

@bob-carpenter bob-carpenter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for contributing this.

I am really sorry that I left 73 comments on such a short chapter. It was meant pedagogically and I hope it helps other things you write. It took a couple years of this kind of back-and-forth with Gelman and Vehtari and Goodrich before they stopped marking up everything I wrote this way. Gelman and Vehtari are excellent role models for writing clarity.

If you'd rather not do this, I'm happy to make all the changes I suggested myself.

their precision. So, it is often necessary to account explicitly for
the missing data when fitting a model of interest[^bda].

[^bda]: Chapter 18 in @GelmanEtAl:2013 offers a Bayesian perspective
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just skimmed Chapter 18. Gelman et al. do not provide a fully Bayesian perspective, he instead uses multiple imputation. The fully Bayesian perspective is given in the User's Guide chapter on missing data.

I would also put this into the main text. Only use footnotes in doc as a last resort.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not fully Bayesian but isn't it partially Bayesian?

[^bda]: Chapter 18 in @GelmanEtAl:2013 offers a Bayesian perspective
for analyzing data with missing values.

Multiple imputation replaces missing values with different estimates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The missing values are not values---they're missing, so nothing is getting replaced.

There's also no estimation going on in multiple imputation itself. Instead, we're drawing from the posterior of a model of the missing data and using that in downstream inference.

The result is not Bayesian. It gives you the same thing you get using "cut" in BUGS.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to rewrite this part. "Replace" and "substitute" are very common terms in the stuff I read about multiple imputation. How would you suggest describing this?

to obtain different versions of a complete data set. Then, the model
of interest can be fitted to each of these complete versions
separately. Combining the results from these different fits produces
estimates that account for the uncertainty in the missing values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result is a number of draws that form a sample---there's still no estimation going on.

One way to create an estimate would be to average the complete set of draws.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the new version clearer?

<!-- there is no need to use any special pooling
rules[^rubin] to account for the uncertainty in the imputation. -->

[^rubin]: Chapter 3, page 45 of @carpenter-etal:2023 summarizes one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove---this seems to be an unmoored comment. At least it doesn't show up when I render the html.


## Cut models

[**NOTE**: I would greatly appreciate any comments or changes to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this note.

If you're not comfortable writing it, you can reduce this section to something really simple.

The point is just that using ad hoc multiple imputation like this is equivalent to doing cut in BUGS as described by Plummer, because there's no information flow from the second-stage inference back to the multiple imputation as you would get in the fully Bayesian model described in the chapter on missing data earlier in the user's guide.

[**NOTE**: I would greatly appreciate any comments or changes to
improve this subsection.]

A full bayesian probability model includes a feedback flow of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bayesian -> Bayesian

The model doesn't have any feedback or flow per se---it's just how joint distributions work.

influence only some parameters in the model. From @plummer:2015,
p. 37:

> Cut models arise in applications with multiple data sources that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove comments.

p. 37:

> Cut models arise in applications with multiple data sources that
provide information about different parameters in the model [...]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, rather than filling this all in, I'd just write a one-line summary and point to Plummer's article.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants