feat(examples): Add end-to-end reproduction of Boag et al. 2018 mistrust pipeline on MIMIC-III#962
Open
vtewari2 wants to merge 1 commit intosunlabuiuc:masterfrom
Conversation
…-III
Add examples/mistrust_prediction/mistrust_mimic3_logistic_regression.py,
a complete reproduction of the Boag et al. 2018 mistrust classifier
pipeline using PyHealth:
- Loads MIMIC-III via MIMIC3Dataset (CHARTEVENTS + NOTEEVENTS tables)
- Builds interpersonal itemid map from D_ITEMS via
build_interpersonal_itemids()
- Runs MistrustNoncomplianceMIMIC3 and MistrustAutopsyMIMIC3 tasks
- Trains LogisticRegression with L1 regularisation (l1_lambda matched
to sklearn C=0.1: 2.62e-4 for noncompliance, 1.43e-2 for autopsy)
- Evaluates AUC-ROC via pyhealth.trainer.Trainer
- --synthetic flag for smoke-test without PhysioNet data access
- Documents expected AUC targets (0.667 noncompliance, 0.531 autopsy)
and paper-equivalent hyperparameter derivation
Paper: arXiv:1808.03827 | Data: MIMIC-III v1.4 (PhysioNet)
Co-Authored-By: Varun Tewari <vtewari2@illinois.edu>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a complete, runnable end-to-end example that reproduces the medical
mistrust classification pipeline from:
The example ties together all PyHealth components introduced in PRs #960 and
#961 into a single runnable script, and serves as a reference implementation
for reproducing the paper's supervised mistrust classifiers using modern
Python tooling.
Background
Boag et al. 2018 establishes that racial disparities in aggressive
end-of-life ICU care are better explained by medical mistrust than by
race alone. The paper trains L1-regularised logistic regression models on
interpersonal CHARTEVENTS features to produce continuous mistrust scores,
then uses those scores to stratify treatment duration disparities.
Key findings reproduced by this pipeline:
patients (p=0.009)
LogisticRegression+ all mistrust features improves mortality AUC from0.629 (baseline demographics) to 0.661
New:
examples/mistrust_prediction/mistrust_mimic3_logistic_regression.pyWhat it does
MIMIC-III v1.4
└── MIMIC3Dataset(tables=["CHARTEVENTS", "NOTEEVENTS"])
│
├── build_interpersonal_itemids(D_ITEMS.csv.gz)
│ → {itemid: label} (~168 entries)
│
├── MistrustNoncomplianceMIMIC3.set_task()
│ → interpersonal_features (sequence) + noncompliance (binary)
│ → LogisticRegression(l1_lambda=2.62e-4)
│ → Trainer.train() + evaluate() → AUC-ROC
│
└── MistrustAutopsyMIMIC3.set_task()
→ interpersonal_features (sequence) + autopsy_consent (binary)
→ LogisticRegression(l1_lambda=1.43e-2)
→ Trainer.train() + evaluate() → AUC-ROC
L1 lambda derivation (paper-equivalent)
The original paper uses
sklearn LogisticRegression(C=0.1, penalty='l1').The PyHealth equivalent uses
l1_lambda = 1 / (C × n_train):2.62e-41.43e-2Modes