Add chapter on multiple imputation.#935
Conversation
|
Hi @abner-hb, thanks for this! I just made a couple commits to remove the changes to the built docs -- we let our Jenkins jobs build those for us during releases. I'll ask @bob-carpenter to take a look at the contents when he gets a chance |
|
I can review this. |
There was a problem hiding this comment.
Thanks so much for contributing this.
I am really sorry that I left 73 comments on such a short chapter. It was meant pedagogically and I hope it helps other things you write. It took a couple years of this kind of back-and-forth with Gelman and Vehtari and Goodrich before they stopped marking up everything I wrote this way. Gelman and Vehtari are excellent role models for writing clarity.
If you'd rather not do this, I'm happy to make all the changes I suggested myself.
| their precision. So, it is often necessary to account explicitly for | ||
| the missing data when fitting a model of interest[^bda]. | ||
|
|
||
| [^bda]: Chapter 18 in @GelmanEtAl:2013 offers a Bayesian perspective |
There was a problem hiding this comment.
I just skimmed Chapter 18. Gelman et al. do not provide a fully Bayesian perspective, he instead uses multiple imputation. The fully Bayesian perspective is given in the User's Guide chapter on missing data.
I would also put this into the main text. Only use footnotes in doc as a last resort.
There was a problem hiding this comment.
It's not fully Bayesian but isn't it partially Bayesian?
| [^bda]: Chapter 18 in @GelmanEtAl:2013 offers a Bayesian perspective | ||
| for analyzing data with missing values. | ||
|
|
||
| Multiple imputation replaces missing values with different estimates |
There was a problem hiding this comment.
The missing values are not values---they're missing, so nothing is getting replaced.
There's also no estimation going on in multiple imputation itself. Instead, we're drawing from the posterior of a model of the missing data and using that in downstream inference.
The result is not Bayesian. It gives you the same thing you get using "cut" in BUGS.
There was a problem hiding this comment.
I'm not sure how to rewrite this part. "Replace" and "substitute" are very common terms in the stuff I read about multiple imputation. How would you suggest describing this?
| to obtain different versions of a complete data set. Then, the model | ||
| of interest can be fitted to each of these complete versions | ||
| separately. Combining the results from these different fits produces | ||
| estimates that account for the uncertainty in the missing values. |
There was a problem hiding this comment.
The result is a number of draws that form a sample---there's still no estimation going on.
One way to create an estimate would be to average the complete set of draws.
| <!-- there is no need to use any special pooling | ||
| rules[^rubin] to account for the uncertainty in the imputation. --> | ||
|
|
||
| [^rubin]: Chapter 3, page 45 of @carpenter-etal:2023 summarizes one |
There was a problem hiding this comment.
Remove---this seems to be an unmoored comment. At least it doesn't show up when I render the html.
|
|
||
| ## Cut models | ||
|
|
||
| [**NOTE**: I would greatly appreciate any comments or changes to |
There was a problem hiding this comment.
Remove this note.
If you're not comfortable writing it, you can reduce this section to something really simple.
The point is just that using ad hoc multiple imputation like this is equivalent to doing cut in BUGS as described by Plummer, because there's no information flow from the second-stage inference back to the multiple imputation as you would get in the fully Bayesian model described in the chapter on missing data earlier in the user's guide.
| [**NOTE**: I would greatly appreciate any comments or changes to | ||
| improve this subsection.] | ||
|
|
||
| A full bayesian probability model includes a feedback flow of |
There was a problem hiding this comment.
bayesian -> Bayesian
The model doesn't have any feedback or flow per se---it's just how joint distributions work.
| influence only some parameters in the model. From @plummer:2015, | ||
| p. 37: | ||
|
|
||
| > Cut models arise in applications with multiple data sources that |
| p. 37: | ||
|
|
||
| > Cut models arise in applications with multiple data sources that | ||
| provide information about different parameters in the model [...] |
There was a problem hiding this comment.
OK, rather than filling this all in, I'd just write a one-line summary and point to Plummer's article.
Submission Checklist
<<{ since VERSION }>>YES (no new functions)Summary
Add a chapter on multiple imputation to the Stan User's Guide.
Copyright and Licensing
Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Abner Heredia Bustos
By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses: