You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/Examples/semi_gradient_sarsa_three_columns.rst
+52-15Lines changed: 52 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,25 +1,33 @@
1
1
Semi-gradient SARSA algorithm
2
2
=============================
3
3
4
+
Overview
5
+
--------
6
+
7
+
In this example, we use the episodic semi-gradient SARSA algorithm to anonymize a data set with three columns.
8
+
9
+
10
+
Semi-gradient SARSA algorithm
11
+
-----------------------------
12
+
4
13
In this example, we continue using a three-column data set as in the `Q-learning on a three columns dataset <qlearning_three_columns.html>`_.
5
-
In that example, we used a state aggregation approach to model the overall distortion of the data set in the range :math:`[0, 1]`.
6
-
Herein, we take an alternative approach. We will assume that the column distortion is in the range :math:`\[0, 1]` where the edge points mean no distortion
7
-
and full distortion of the column respectively. For each column, we will use the same approach to discretize the continuous :math:`[0, 1]` range
8
-
into a given number of disjoint bins.
14
+
In that example, we used state aggregation to model the overall distortion of the data set in the range :math:`[0, 1]`.
15
+
Herein, we take an alternative approach. We will assume that the column distortion is in the range :math:`[0, 1]` where the edge points mean no distortion
16
+
and full distortion of the column respectively. For each column, we will use the same methodology as in `Q-learning on a three columns dataset <qlearning_three_columns.html>`_ to discretize the continuous :math:`[0, 1]` range into a given number of disjoint bins.
9
17
10
18
Contrary to representing the state-action function :math:`q_{\pi}` using a table as we did in `Q-learning on a three columns dataset <qlearning_three_columns.html>`_, we will assume a functional form for
11
19
it. Specifically, we assume that the state-action function can be approximated by :math:`\hat{q} \approx q_{\pi}` given by
where :math:`\mathbf{w}` is the weights vector and :math:`\mathbf{x}(s, \alpha)` is called the feature vector representing state :math:`s` when taking action :math:`\alpha` [1]. For our case the components of the feature vector will be distortions of the three columns when applying action :math:`\alpha` on the data set. Our goal now is to find the components of the weight vector. We can the stochastic gradient descent (or SGD )
17
-
for this [1]. In this case, the update rule is [1]
24
+
where :math:`\mathbf{w}` is the weights vector and :math:`\mathbf{x}(s, \alpha)` is called the feature vector representing state :math:`s` when taking action :math:`\alpha` [1]. We will use `Tile coding`_ to construct :math:`\mathbf{x}(s, \alpha)`. Our goal now is to find the components of the weight vector.
25
+
We can use stochastic gradient descent (or SGD ) for this [1]. In this case, the update rule is [1]
We will use bins to discretize the deformation range for each column in the data set.
33
-
The state vector will contain these deformations. Hence, for the three column data set, the state vector will have three entries, each indicating the distortion of the respective column.
34
-
35
40
The semi-gradient SARSA algorithm is shown below
36
41
37
42
.. figure:: images/semi_gradient_sarsa.png
38
43
39
44
Episodic semi-gradient SARSA algorithm. Image from [1].
40
45
41
46
42
-
43
-
44
-
Tiling
45
-
------
47
+
Tile coding
48
+
-------------
49
+
50
+
Since we consider all the columns distortions in the data set, means that we deal with a multi-dimensional continuous spaces. In this case,
51
+
we can use tile coding to construct :math:`\mathbf{x}(s, \alpha)` [1].
52
+
53
+
Tile coding is a form of coarse coding for multi-dimensional continuous spaces [1]. In this method, the features are grouped into partitions of the state
54
+
space. Each partition is called a tiling, and each element of the partition is called a
55
+
tile [1]. The following figure shows the a 2D state space partitioned in a uniform grid (left).
56
+
If we only use this tiling, we would not have coarse coding but just a case of state aggregation.
57
+
58
+
In order to apply coarse coding, we use overlapping tiling partitions. In this case, each tiling is offset by a fraction of a tile width [1].
59
+
A simple case with four tilings is shown on the right side of following figure.
60
+
46
61
47
62
We will use a linear function approximation for :math:`\hat{q}`:
48
63
@@ -53,9 +68,22 @@ We will use a linear function approximation for :math:`\hat{q}`:
53
68
These tilings are offset from one another by a uniform amount in each dimension. Image from [1].
54
69
55
70
71
+
One practical advantage of tile coding is that the overall number of features that are active
72
+
at a given instance is the same for any state [1]. Exactly one feature is present in each tiling, so the total number of features present is
73
+
always the same as the number of tilings [1]. This allows the learning parameter :math:`\eta`, to be set according to
74
+
75
+
.. math::
76
+
\eta = \frac{1}{n}
77
+
78
+
79
+
where :math:`n` is the number of tilings.
80
+
81
+
56
82
Code
57
83
----
58
84
85
+
The necessary imports
86
+
59
87
.. code-block::
60
88
61
89
import random
@@ -77,6 +105,8 @@ Code
77
105
from src.utils.string_distance_calculator import StringDistanceType
78
106
from src.utils.reward_manager import RewardManager
0 commit comments