|
1 | 1 | # RL anonymity |
2 | 2 |
|
3 | | -An experimental effort to use reinforcement learning techniques for data anonymity. |
| 3 | +An experimental effort to use reinforcement learning techniques for data anonymization. |
4 | 4 |
|
5 | 5 | ## Conceptual overview |
6 | 6 |
|
| 7 | +The term data anonymization refers to techiniques that can be applied on a given dataset, D, such that after |
| 8 | +the latter has been submitted to such techniques, it makes it difficult for a third party to identify or infer the existence |
| 9 | +of specific individuals in D. Anonymization techniques, typically result into some sort of distortion |
| 10 | +of the original dataset. This means that in order to maintain some utility of the transformed dataset, the transofrmations |
| 11 | +applied should be constrained in some sense. In the end, it can be argued, that data anonymization is an optimization problem |
| 12 | +meaning striking the right balance between data utility and privacy. |
| 13 | + |
7 | 14 | Reinforcement learning is a learning framework based on accumulated experience. In this paradigm, an agent is learning by iteracting with an environment |
8 | | -without (to a large extent) any supervision. The following image schematically describes the reinforcement learning framework |
| 15 | +without (to a large extent) any supervision. The following image describes, schematically, the reinforcement learning framework . |
9 | 16 |
|
10 | 17 |  |
11 | 18 |
|
12 | | -The framework has been use successfully to many recent advances in corntol, robotics, games and elsewhere. |
| 19 | +The agent chooses an action, ```a_t```, to perform out of predefined set of actions ```A```. The chosen action is executed by the environment |
| 20 | +instance and returns to the agent a reward signal, ```r_t```, as well as the new state, ```s_t```, that the enviroment is in. |
| 21 | +The framework has successfully been used to many recent advances in control, robotics, games and elsewhere. |
| 22 | + |
13 | 23 |
|
14 | | -Given that data anonymity is essentially an optimization problem; between data utility and privacy, in this repository we try |
15 | | -to use the reinforcement learning paradigm in order to train agents to perform this optimization for us. The following image |
16 | | -places this into a persepctive |
| 24 | +Let's assume that we have in our disposal two numbers a minimum distortion, ```MIN_DIST``` that should be applied to the dataset |
| 25 | +for achieving privacy and a maximum distortion, ```MAX_DIST```, that should be applied to the dataset in order to maintain some utility. |
| 26 | +Let's assume also that any overall dataset distortion in ```[MIN_DIST, MAX_DIST]``` is acceptable in order to cast the dataset as |
| 27 | +preserving privacy and preserving dataset utility. We can then train a reinforcement learning agent to distort the dataset |
| 28 | +such that the aforementioned objective is achieved. |
17 | 29 |
|
| 30 | +Overall, this is shown in the image below. |
18 | 31 |
|
19 | 32 |  |
20 | 33 |
|
| 34 | +The images below show the overall running distortion average and running reward average achieved by using the |
| 35 | +<a href="https://en.wikipedia.org/wiki/Q-learning">Q-learning</a> algorithm and various policies. |
| 36 | + |
| 37 | +**Q-learning with epsilon-greedy policy and constant epsilon** |
| 38 | + |
| 39 | + |
| 40 | + |
| 41 | +**Q-learning with epsilon-greedy policy and decaying epsilon per episode** |
| 42 | + |
| 43 | + |
| 44 | + |
| 45 | + |
| 46 | +**Q-learning with epsilon-greedy policy with decaying epsilon at constant rate** |
| 47 | + |
| 48 | + |
| 49 | + |
| 50 | +**Q-learning with softmax policy running average distorion** |
| 51 | + |
| 52 | + |
| 53 | + |
| 54 | + |
21 | 55 | ## Dependencies |
22 | 56 |
|
| 57 | +- NumPy |
| 58 | + |
23 | 59 | ## Documentation |
24 | 60 |
|
| 61 | +## References |
| 62 | + |
0 commit comments