Skip to content

Out-of-core overhaul and new event data support

Choose a tag to compare

@antheas antheas released this 21 Aug 13:06
· 363 commits to master since this release

This new release overhauls and standardizes Pasteur's API to prepare it for multi-modal data synthesis. In addition, it fixes some of its rough parts, by making the process of fitting Encodings, Transformations, and Metrics out-of-core through a map-reduce architecture.

For transforming event data, a new type of Transformer, Seq(uence) Transformer is added. This transformer is multi-table aware and can, for example, encode inter-row references (such as a date of #3 row for patient X having a dependency on #2 row). A built-in implementation of this transformer, named SeqTransformerWrapper (accessed through the name seq), contains the necessary joining logic to wrap existing reference transformers into supporting this format.

The new mimic_core view in extras is provided as a proof of concept for this new transformation format, which contains the three core tables of mimic (patients, admissions, and transfers).