Skip to content

Commit 8f55d91

Browse files
authored
Merge pull request #72 from pockerman/investigate_sarsa_semi_gradient
Investigate sarsa semi gradient
2 parents 510fe41 + b047d3e commit 8f55d91

File tree

2 files changed

+34
-9
lines changed

2 files changed

+34
-9
lines changed
64.9 KB
Loading

docs/source/Examples/semi_gradient_sarsa_three_columns.rst

Lines changed: 34 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,19 +2,41 @@ Semi-gradient SARSA algorithm
22
=============================
33

44
In this example, we continue using a three-column data set as in the `Q-learning on a three columns dataset <qlearning_three_columns.html>`_.
5-
In that example, we used a state aggregation approach to model the overall distortion of the data set in the range :math:`\[0, 1]`.
6-
In this example, we take an alternative approach. We will use bins to discretize the deformation range for each column in the data set.
7-
The state vector will contain these deformations. Hence, for the three column data set, the state vector will have three entries,
8-
each indicating the distortion of the respective column.
5+
In that example, we used a state aggregation approach to model the overall distortion of the data set in the range :math:`[0, 1]`.
6+
Herein, we take an alternative approach. We will assume that the column distortion is in the range :math:`\[0, 1]` where the edge points mean no distortion
7+
and full distortion of the column respectively. For each column, we will use the same approach to discretize the continuous :math:`[0, 1]` range
8+
into a given number of disjoint bins.
99

10+
Contrary to representing the state-action function :math:`q_{\pi}` using a table as we did in `Q-learning on a three columns dataset <qlearning_three_columns.html>`_, we will assume a functional form for
11+
it. Specifically, we assume that the state-action function can be approximated by :math:`\hat{q} \approx q_{\pi}` given by
1012

11-
13+
.. math::
14+
\hat{q}(s, \alpha) = \mathbf{w}^T\mathbf{x}(s, \alpha) = \sum_{i}^{d} w_i, x_i(s, \alpha)
15+
16+
where :math:`\mathbf{w}` is the weights vector and :math:`\mathbf{x}(s, \alpha)` is called the feature vector representing state :math:`s` when taking action :math:`\alpha` [1]. For our case the components of the feature vector will be distortions of the three columns when applying action :math:`\alpha` on the data set. Our goal now is to find the components of the weight vector. We can the stochastic gradient descent (or SGD )
17+
for this [1]. In this case, the update rule is [1]
18+
19+
.. math::
20+
\mathbf{w}_{t + 1} = \mathbf{w}_t + \eta\left[U_t - \gamma \hat{q}(s_t, \alpha_t, \mathbf{w}_t)\right] \nabla_{\mathbf{w}} \hat{q}(s_t, \alpha_t, \mathbf{w}_t)
21+
22+
where :math:`U_t` for one-step SARSA is given by [1]:
23+
24+
.. math::
25+
U_t = R_t + \gamma \hat{q}(s_{t + 1}, \alpha_{t + 1}, \mathbf{w}_t)
26+
27+
Since, :math:`\hat{q}(s, \alpha)` is a linear function with respect to the weights, its gradient is given by
28+
29+
.. math::
30+
\nabla_{\mathbf{w}} \hat{q}(s, \alpha) = \mathbf{x}(s, \alpha)
31+
32+
We will use bins to discretize the deformation range for each column in the data set.
33+
The state vector will contain these deformations. Hence, for the three column data set, the state vector will have three entries, each indicating the distortion of the respective column.
1234

1335
The semi-gradient SARSA algorithm is shown below
1436

1537
.. figure:: images/semi_gradient_sarsa.png
1638

17-
Episodic semi-gradient SARSA algorithm. Image from [1]
39+
Episodic semi-gradient SARSA algorithm. Image from [1].
1840

1941

2042

@@ -24,8 +46,11 @@ Tiling
2446

2547
We will use a linear function approximation for :math:`\hat{q}`:
2648

27-
.. math::
28-
\hat{q} = \mathbf{w}^T\mathbf{x}
49+
50+
.. figure:: images/tiling_example.png
51+
52+
Multiple, overlapping grid-tilings on a limited two-dimensional space.
53+
These tilings are offset from one another by a uniform amount in each dimension. Image from [1].
2954

3055

3156
Code
@@ -210,4 +235,4 @@ Code
210235
References
211236
----------
212237

213-
1. Sutton and Barto, Reinforcement Learning
238+
1. Richard S. Sutton and Andrw G. Barto, Reinforcement Learning. An Introduction 2nd Edition, MIT Press

0 commit comments

Comments
 (0)