Skip to content

Commit d5c4fcf

Browse files
committed
update readme and version
1 parent 2883c77 commit d5c4fcf

File tree

2 files changed

+22
-5
lines changed

2 files changed

+22
-5
lines changed

README.md

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,16 +13,33 @@
1313
[![Code style is Black](https://img.shields.io/badge/code%20style-black-black.svg)](https://github.com/psf/black)
1414
<!-- [![]()]() -->
1515

16-
Pasteur is a library for performing privacy-aware end-to-end data synthesis.
16+
Pasteur is a library for managing the end-to-end process of data synthesis.
1717
Gather your raw data and preprocess, synthesize, and evaluate it within a single
1818
project.
1919
Use the tools you're familiar with: numpy, pandas, scikit-learn, scipy or any other.
2020
When your dataset grows, scale to out-of-core data by using Pasteur's parallelization
2121
and partitioning primitives, without code changes or using different libraries.
2222

23-
## Reproducibility
24-
You can find the experiment files that can be used to reproduce the paper
25-
about Pasteur [here](https://github.com/pasteur-dev/pasteur/tree/paper/notebooks/paper).
23+
Pasteur focuses on providing a common platform for the processing, evaluation and
24+
sharing of synthetic data.
25+
In the current version, Pasteur can ingest and encode arbitrary multi-table
26+
hierarchical/sequential datasets with a mixture of numerical, categorical, and date values
27+
into a common format for synthesis, through a flexible metadata and encoding system.
28+
Post synthesis, Pasteur can evaluate the resulting data through a multi-table
29+
native, extensible evaluation architecture (with built-in support for basic metrics
30+
such as histograms) and allows for comparison to "ideal" synthetic data, through the
31+
use of a hold-out reference set, which it also creates and manages.
32+
33+
Pasteur features built-in support for synthesizing data using PrivBayes, AIM, or MST
34+
(due to the lack of viable multi-table synthesis algorithms).
35+
If not, or if a custom algorithm should be used, it is trivial to add support for
36+
it to Pasteur, by implementing the `Synth` interface.
37+
38+
>
39+
> Pasteur is currently an early research alpha. It is architected to allow multi-modal
40+
> data synthesis (e.g., the combination of hierarchical data with sounds and images)
41+
> and will soon feature a novel synthesis method for hierarchical/event-based data.
42+
>
2643
2744
## Usage
2845
You can install Pasteur with pip.

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "pasteur"
3-
version = "0.1.1"
3+
version = "0.2.0"
44
authors = [
55
{ name="Kapenekakis Antheas", email="antheas@cs.aau.dk" },
66
]

0 commit comments

Comments
 (0)