Cardiology Multilabel Dataset and Classification Tasks#898
Cardiology Multilabel Dataset and Classification Tasks#898jiama843 wants to merge 11 commits intosunlabuiuc:masterfrom
Conversation
…ead-ablation Add the spatial ablation study (12-lead vs single-lead)
jhnwu3
left a comment
There was a problem hiding this comment.
Just some small bits of feedback:
It seems this dataset has a defunct dataset variant here: https://github.com/sunlabuiuc/PyHealth/blob/master/pyhealth/datasets/cardiology.py . It would be good to just replace it instead of creating a duplicate but functional class here.
Looks like the CI fails from some other bug here. Will need to look into that myself.
|
@jhnwu3 That sounds good, the only concern I have is there are some other examples which use the old dataset (I believe a section in We just wanted to avoid cascading those two changes in one PR. If needed (not sure if its orthodox) - I can try putting up a separate PR to deprecate/replace the old |
Authors: John Ma (jm119@illinois.edu), Jia Lin Cheoh (jcheoh2@illinois.edu), Leo Yoon (byoon7@illinois.edu)
Dataset: PhysioNet/CinC Challenge 2020 (v1.0.2)
Protocol Paper: Nonaka and Seita (2021) - Towards Improving Multi-label ECG Classification
Overview
This PR contributes a new dataset and a multilabel classification task for the PhysioNet/Computing in Cardiology Challenge 2020 dataset. It includes support for 12-lead ECG signals and a specialized task parameter for lead ablation studies.
Dataset (
cardiology2.pyandcardiology.yaml)chosen_datasetbinary mask..heaheader files to extract SNOMED-CT diagnosis codes, patient sex, and age, generating a flat metadata CSV for efficient loading.BaseDatasetwith integrated caching to speed up subsequent loads.Tasks (
cardiology_multilabel_classification.py)CardiologyMultilabelClassification.epoch_sec(window length) andshift(stride).leadsparameter allowing users to select specific ECG leads (e.g., Lead I only) to simulate wearable device constraints vs. clinical 12-lead setups.Examples & Ablation Studies
cardiology_multilabel.ipynb: A complete walkthrough from data installation to a sanity check on aSparcNetmodel.cardiology_multilabel_resnet_lead_ablation.py: A specialized script comparing clinical (12-lead) vs. wearable (1-lead) configurations using a native PyHealthResNet.Unit Tests (
test_cardiology_multilabel.py)multilabelmode and output dimensions based on the dataset schema.Review Note (Dataset Access)
The PhysioNet 2020 dataset is open access and does not require a signed Data Use Agreement (DUA) for the primary challenge files. The implementation expects the data to be downloaded locally via
wgetorgsutilas outlined in the provided notebook.