-
Notifications
You must be signed in to change notification settings - Fork 556
Open
4 / 64 of 6 issues completedOpen
4 / 64 of 6 issues completed
Copy link
Description
Starting from 2.0a11, PyHealth starts to use a disk-based memory efficient dataset to reduce the memory usage for large dataset such as MIMIC4.
This issues tracks any potential bugs or improvements required for new memory efficient dataset.
Improvements
- Better docs for the cache behaviour. #763
- Add option to cache transformed data from processors and skip pipeline entirely #774
Add option to cache transformed data from processors and skip pipeline entirely #783 -
.set_taskgives write cache to the same directory for the same task with different configuration #764
Update the default task cache path to include task parameter names and values #766 - Batched processing for task transformation to speed up.
Furthur optimization on task transformation. #750 - Support multi-worker for task transformation.
Multiprocess task transformation #748 - Support configure
n_workerfor dask.
Add num_workers to BaseDataset #743
Bugs
- Add a
.clear_cacheand.clear_task_cachemethod to avoid the need to manually delete the cache. #765
Add clear_cache and clear_task_cache methods to BaseDataset #770 - The code will hang at set_task if any of the worker have 0 sample written. #782
Fix the code will hang at set_task if any of the worker have 0 sample written #784 - Temporary folder for dataset is not proprely cleaned after dataset processing.
Clean up tmpdir correctly, cache task transformation result, and better notebook support. #753 - Cached data is not cleaned if the program crashed in the middle, which may lead to corrupted cache file.
Clean up tmpdir correctly, cache task transformation result, and better notebook support. #753 - Incorrect null handling for patient_id and timestamp.
Fix incorrect null handling for patient_id and timestamp #746 - Time-series processor, process() doesn't seem to properly set the self.n_features or self.size() function properly #742
Fix/processors fit process #744
Sub-issues
Metadata
Metadata
Assignees
Labels
No labels