Skip to content

feat(tasks): add medical mistrust tasks for MIMIC-III (Boag et al. 2018)#961

Open
vtewari2 wants to merge 1 commit intosunlabuiuc:masterfrom
vtewari2:pr/uiuccs598dlh/mistrust-tasks/interpersonal-features-mimic3
Open

feat(tasks): add medical mistrust tasks for MIMIC-III (Boag et al. 2018)#961
vtewari2 wants to merge 1 commit intosunlabuiuc:masterfrom
vtewari2:pr/uiuccs598dlh/mistrust-tasks/interpersonal-features-mimic3

Conversation

@vtewari2
Copy link
Copy Markdown

@vtewari2 vtewari2 commented Apr 11, 2026

Description:

Summary

Adds two binary classification tasks and a helper function that implement
the computational mistrust proxies from:

Boag et al. "Racial Disparities and Mistrust in End-of-Life Care."
MLHC 2018. arXiv:1808.03827

Both tasks extract interpersonal interaction features from CHARTEVENTS
(structured, ~168 binary features covering agitation scales, restraints,
education readiness, family communication, pain assessments, etc.) and derive
binary labels from free-text NOTEEVENTS, one admission at a time.

Background

The paper identifies medical mistrust — a historically grounded institutional
skepticism prevalent in minority communities — as a primary driver of racial
disparities in aggressive end-of-life care. It quantifies mistrust through
three algorithmic proxies; this PR implements the two supervised classifiers:

Proxy Label source Signal
Noncompliance "noncompliant" substring in any note Active refusal of care
Autopsy consent Consent/decline keywords in notes Post-mortem distrust of care quality

Both use the same interpersonal_features input — a deduplicated sequence
of normalised CHARTEVENTS feature-key strings — compatible with
LogisticRegression (and any other PyHealth sequence model).

New: pyhealth/tasks/mistrust_mimic3.py

build_interpersonal_itemids(d_items_path)

Helper that reads D_ITEMS.csv.gz and returns {itemid: label} for all
CHARTEVENTS items whose label matches the ~40 interpersonal keywords from
the paper's trust.ipynb. Produces ~168 matched ITEMIDs on MIMIC-III v1.4.

from pyhealth.tasks import build_interpersonal_itemids                                                 

itemid_to_label = build_interpersonal_itemids("/path/to/D_ITEMS.csv.gz")                               
# {720: 'Ventilator Mode', 228096: 'Riker-SAS Scale', ...}  ~168 entries             
MistrustNoncomplianceMIMIC3                       

input_schema  = {"interpersonal_features": "sequence"}                                                 
output_schema = {"noncompliance": "binary"}       

- Label 1 if any NOTEEVENTS note for the admission contains "noncompliant",                            
else 0. Base rate0.88 % in MIMIC-III v1.4.     
- All admissions with1 interpersonal chartevents feature receive a label                             
(default 0 / trusting), mirroring the original paper's labelling strategy.                             

MistrustAutopsyMIMIC3                             

input_schema  = {"interpersonal_features": "sequence"}                                                 
output_schema = {"autopsy_consent": "binary"}     

- Label 1 (consent / mistrustful) if notes contain consent/agree/request                               
near "autopsy"; 0 (decline / trusting) for decline/refuse/denied.                                      
- Admissions where both signals appear are excluded as ambiguous.                                      
- Only admissions with an explicit autopsy signal receive a label (~1,009                              
in MIMIC-III v1.4; Black patients consent at ~39% vs ~26% for White).    
Feature normalisation                             

Both tasks apply the full normalisation pipeline from trust.ipynb cell 7:                              

┌──────────────────────────────────────────────────────────────────────────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐   
│                                      Label patternNormalised to                                                  │ 
├──────────────────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤   
│ reason for restraint6 buckets: none / threat of harm / confusion-delirium / presence of violence / treatment interference / risk    │ 
│                                                                                          │ for falls                                                                                                       │ 
├──────────────────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤   
│ restraint locationnone / 4 point restraint / some restraint                                                                       │   
├──────────────────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤   
│ restraint devicesitter / limb / (raw)                                                                                           │   
├──────────────────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ 
│ bathpartial / self / refused / shave / hair / none / done                                                           │   
├──────────────────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ 
│ behavior, behavioral stateskipped                                                                                                         │   
├──────────────────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ 
│ pain management/type/cause/locationskipped                                                                                                         │   
├──────────────────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ 
│ pain level*, education topic*, safety measures*, side rails*, status and comfort*,       │ kept as-is                                                                                                      │   
│ *informed*                                                                               │                                                                                                                 │ 
├──────────────────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤   
│ all others"label||value"                                                                                                  │ 
└──────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘   

Feature keys have the form "category||normalised_value" and are learned                                
into a vocabulary automatically by PyHealth's tokeniser during set_task().                            

Updated: pyhealth/tasks/__init__.py               

Exports MistrustNoncomplianceMIMIC3, MistrustAutopsyMIMIC3, and                                        
build_interpersonal_itemids.                      
Usage                                             

from pyhealth.datasets import MIMIC3Dataset       
from pyhealth.tasks import (                      
    MistrustNoncomplianceMIMIC3,                  
    MistrustAutopsyMIMIC3,                        
    build_interpersonal_itemids,                  
)                                                 
from pyhealth.models import LogisticRegression    

# Build itemid map from D_ITEMS                   
itemid_to_label = build_interpersonal_itemids("/path/to/D_ITEMS.csv.gz")                               

# Load dataset — requires CHARTEVENTS + NOTEEVENTS
base_dataset = MIMIC3Dataset(                     
    root="/path/to/mimic-iii/1.4",                
    tables=["CHARTEVENTS", "NOTEEVENTS"],         
)                                                 

# Noncompliance task                              
nc_dataset = base_dataset.set_task(               
    MistrustNoncomplianceMIMIC3(itemid_to_label=itemid_to_label)                                       
)                                                 

# Autopsy task                                    
au_dataset = base_dataset.set_task(               
    MistrustAutopsyMIMIC3(itemid_to_label=itemid_to_label)                                             
)                                                 

# Train with L1 regularisation (requires PR #1)   
model = LogisticRegression(dataset=nc_dataset, l1_lambda=2.6e-4)                                       

Dependencies                                      

- Requires PR #1 (l1_lambda in LogisticRegression) for                                                 
paper-equivalent training. The tasks themselves are model-agnostic and                                
work with any PyHealth sequence model.            
- MIMIC-III v1.4 with PhysioNet credentialed access.                                                   

Related PRs                                       

This is PR 2 of 3 in the Boag et al. 2018 mistrust pipeline series. 
- PR #960 pr/uiuccs598dlh/logistic-regression/l1-regularization ← merge first                           
- PR #962 pr/uiuccs598dlh/paper-pipeline/eol-mistrust-boag-2018 ← merge after this

Add two binary classification tasks and a helper that reproduce the
interpersonal-feature mistrust classifiers from:

  Boag et al. "Racial Disparities and Mistrust in End-of-Life Care."
  MLHC 2018. arXiv:1808.03827

New file: pyhealth/tasks/mistrust_mimic3.py
  - build_interpersonal_itemids(d_items_path): reads D_ITEMS.csv.gz and
    returns {itemid: label} for ~168 interpersonal CHARTEVENTS items
    matched via keyword list from trust.ipynb.
  - MistrustNoncomplianceMIMIC3: predicts "noncompliant" label from
    NOTEEVENTS using interpersonal CHARTEVENTS features as a sequence
    input. Label 1 = noncompliant (mistrustful), 0 = compliant.
  - MistrustAutopsyMIMIC3: predicts autopsy consent from the same
    features. Label 1 = consent (mistrustful), 0 = decline (trusting).
    Admissions with both consent and decline signals are excluded.
  - Full feature normalisation mirroring trust.ipynb cell 7 (restraint
    coarsening, bath categories, skip rules for pain mgmt/type/cause).

Updated: pyhealth/tasks/__init__.py
  - Export MistrustNoncomplianceMIMIC3, MistrustAutopsyMIMIC3,
    build_interpersonal_itemids.

Co-Authored-By: Varun Tewari <vtewari2@illinois.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant