|
13 | 13 | [](https://github.com/psf/black) |
14 | 14 | <!-- [![]()]() --> |
15 | 15 |
|
16 | | -Pasteur is a library for performing privacy-aware end-to-end data synthesis. |
| 16 | +Pasteur is a library for managing the end-to-end process of data synthesis. |
17 | 17 | Gather your raw data and preprocess, synthesize, and evaluate it within a single |
18 | 18 | project. |
19 | 19 | Use the tools you're familiar with: numpy, pandas, scikit-learn, scipy or any other. |
20 | 20 | When your dataset grows, scale to out-of-core data by using Pasteur's parallelization |
21 | 21 | and partitioning primitives, without code changes or using different libraries. |
22 | 22 |
|
23 | | -## Reproducibility |
24 | | -You can find the experiment files that can be used to reproduce the paper |
25 | | -about Pasteur [here](https://github.com/pasteur-dev/pasteur/tree/paper/notebooks/paper). |
| 23 | +Pasteur focuses on providing a common platform for the processing, evaluation and |
| 24 | +sharing of synthetic data. |
| 25 | +In the current version, Pasteur can ingest and encode arbitrary multi-table |
| 26 | +hierarchical/sequential datasets with a mixture of numerical, categorical, and date values |
| 27 | +into a common format for synthesis, through a flexible metadata and encoding system. |
| 28 | +Post synthesis, Pasteur can evaluate the resulting data through a multi-table |
| 29 | +native, extensible evaluation architecture (with built-in support for basic metrics |
| 30 | +such as histograms) and allows for comparison to "ideal" synthetic data, through the |
| 31 | +use of a hold-out reference set, which it also creates and manages. |
| 32 | + |
| 33 | +Pasteur features built-in support for synthesizing data using PrivBayes, AIM, or MST |
| 34 | +(due to the lack of viable multi-table synthesis algorithms). |
| 35 | +If not, or if a custom algorithm should be used, it is trivial to add support for |
| 36 | +it to Pasteur, by implementing the `Synth` interface. |
| 37 | + |
| 38 | +> |
| 39 | +> Pasteur is currently an early research alpha. It is architected to allow multi-modal |
| 40 | +> data synthesis (e.g., the combination of hierarchical data with sounds and images) |
| 41 | +> and will soon feature a novel synthesis method for hierarchical/event-based data. |
| 42 | +> |
26 | 43 |
|
27 | 44 | ## Usage |
28 | 45 | You can install Pasteur with pip. |
|
0 commit comments