Skip to content

Commit d9b059a

Browse files
committed
update README.md
1 parent 89090ba commit d9b059a

File tree

2 files changed

+70
-3
lines changed

2 files changed

+70
-3
lines changed

README.md

Lines changed: 68 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,71 @@
66
<p> "To think is to forget a difference, to generalize, to abstract."</p>
77
<p>― <a href="https://en.wikipedia.org/wiki/Jorge_Luis_Borges">Jorge Luis Borges</a>, <a href="https://en.wikipedia.org/wiki/Funes_the_Memorious">Funes the Memorious</a></p>
88
</blockquote>
9-
</div>
9+
</div>
10+
11+
This repo reproduces almost all the figures on the book *Reinforcement Learning: An Introduction(2nd)*.
12+
13+
# Workflow
14+
15+
## Reproduce
16+
17+
Just run the following command to install this package:
18+
19+
```bash
20+
$ julia -e "using Pkg; Pkg.add(\"Plots\"); Pkg.add(PackageSpec(url=\"https://github.com/Ju-jl/Ju.jl.git\")); Pkg.add(PackageSpec(url=\"https://github.com/Ju-jl/ReinforcementLearningAnIntroduction.jl\"));"
21+
```
22+
23+
Then enter the REPL:
24+
25+
```julia
26+
julia> using RLIntro # it might take several minutes to pre-compile
27+
28+
julia> @show [f for f in names(RLIntro) if startswith(string(f), "fig")]; # list all the functions to reproduce corresponding figures
29+
[f for f = names(RLIntro) if startswith(string(f), "fig")] = Symbol[:fig_10_1, :fig_10_2, :fig_10_3, :fig_10_4, :fig_10_5, :fig_11_2, :fig_12_3, :fig_13_1, :fig_13_2, :fig_2_1, :fig_2_2, :fig_2_3, :fig_2_4, :fig_2_5, :fig_2_6, :fig_3_2, :fig_3_5, :fig_4_1, :fig_4_2, :fig_4_3, :fig_5_1, :fig_5_2, :fig_5_3, :fig_5_4, :fig_6_2_a, :fig_6_2_b, :fig_6_2_c, :fig_6_2_d, :fig_6_3_a, :fig_6_3_b, :fig_6_5, :fig_7_2, :fig_8_2, :fig_8_4, :fig_8_4_example, :fig_8_5, :fig_8_7, :fig_8_8, :fig_9_1, :fig_9_10, :fig_9_2_a, :fig_9_2_b, :fig_9_5]
30+
```
31+
32+
**Notice** that for some figures you may need to install *pdflatex*.
33+
34+
## Develop
35+
36+
If you would like to make some improvements, I'd suggest the following workflow:
37+
38+
1. Clone this repo and enter the project folder.
39+
1. Enter the pkg mode and `(RLIntro) pkg> add https://github.com/Ju-jl/Ju.jl.git` (Because the `Ju.j` is not registered yet. It will not be a big problem after Julia 1.1 get released)
40+
1. Make changes to some existing *Environment* or create a new Environment and include it in the REPL (like `include("src/environments/MultiArmBandits.jl")`)
41+
1. Make changes to the related source codes and include it in the REPL (like `include("src/chapter02/ten_armed_testbed.jl")`)
42+
1. Run the functions to draw figures (`fig_2_2()`).
43+
1. Repeat the above three steps.
44+
45+
# Contents
46+
47+
| Chapters | Figures | Description |
48+
|---|:-- | :-- |
49+
| Chapter01 | | Run `play()` will prompt an interactive interface to play the [TicTacToe](https://en.wikipedia.org/wiki/Tic-tac-toe) game. |
50+
| Chapter02 | [fig_2_2](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_2_2.png), [fig_2_3](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_2_3.png), [fig_2_4](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_2_4.png), [fig_2_5](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_2_5.png), [fig_2_6](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_2_6.png) | |
51+
| Chapter03 | [fig_3_2](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_3_2.png), [fig_3_5](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_3_5.png)| Here the heatmap is used to represent the value|
52+
| Chapter04 | [fig_4_1](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_4_1.png), [fig_4_2_policy](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_4_2_policy.png), [fig_4_2_value](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_4_2_value.png), [fig_4_3](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_4_3.png)| |
53+
| Chapter05 | [fig_5_1_no_usable_ace_n_10000](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_5_1_no_usable_ace_n_10000.png), [fig_5_1_usable_ace_n_10000](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_5_1_usable_ace_n_10000.png) | Warning!!! The result is different to the figures on the book. Please help to correct it.|
54+
| | [fig_5_2_no_usable_ace_n_500000](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_5_2_no_usable_ace_n_500000.png), [fig_5_2_usable_ace_n_500000](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_5_2_usable_ace_n_500000.png), [fig_5_2_no_usable_ace_policy_n_500000](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_5_2_no_usable_ace_policy_n_500000.png), [fig_5_2_usable_ace_policy_n_500000](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_5_2_usable_ace_policy_n_500000.png) | |
55+
| | [fig_5_3](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_5_3.png), [fig_5_4](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_5_4.png)| |
56+
| Chapter06 | [fig_6_2_a](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_6_2_a.png), [fig_6_2_b](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_6_2_b.png), [fig_6_2_c](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_6_2_c.png), [fig_6_2_d](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_6_2_d.png) | |
57+
| | [fig_6_3_a](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_6_3_a.png), [fig_6_3_b](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_6_3_b.png) | |
58+
| | [fig_6_5](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_6_5.png) | |
59+
| Chapter07 | [fig_7_2](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_7_2.png) | |
60+
| Chapter08 | [fig_8_2](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_8_2.png), [fig_8_4_example](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_8_4_example.png), [fig_8_4](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_8_4.png) | |
61+
| | [fig_8_5](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_8_5.png), [fig_8_7](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_8_7.png), [fig_8_8_a](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_8_8_a.png), [fig_8_8_b](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_8_8_b.png) | |
62+
| Chapter09 | [fig_9_1_a](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_9_1_a.png), [fig_9_1_b](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_9_1_b.png), [fig_9_2_a](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_9_2_a.png), [fig_9_2_b](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_9_2_b.png), [fig_9_5](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_9_5.png)| |
63+
| Chapter10 | [fig_10_1_1](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_10_1_1.png), [fig_10_1_2](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_10_1_2.png), [fig_10_1_3](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_10_1_3.png), [fig_10_1_4](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_10_1_4.png), [fig_10_1_5](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_10_1_5.png) | |
64+
| | [fig_10_3](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_10_3.png), [fig_10_5](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_10_4.png), [fig_10_5](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_10_5.png)| |
65+
|Chapter11 | [fig_11_2](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_11_2.png) | |
66+
|Chapter12 | [fig_12_3](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_12_3.png)| |
67+
| Chapter13 | [fig_13_1](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_13_1.png), [fig_13_2](https://raw.githubusercontent.com/Ju-jl/ReinforcementLearningAnIntroduction.jl/master/docs/src/assets/figures/figure_13_2.png)
68+
69+
# Related Packages
70+
71+
- [Ju.jl](https://github.com/Ju-jl/Ju.jl)
72+
73+
This repo mainly relies on [Ju.jl](https://github.com/Ju-jl/Ju.jl)
74+
- [ShangtongZhang/reinforcement-learning-an-introduction](https://github.com/ShangtongZhang/reinforcement-learning-an-introduction)
75+
76+
You may also take a look at the Python code.

src/chapter01/tic_tac_toe.jl

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,8 +64,8 @@ function pre_train()
6464
EpisodeSARDBuffer(),
6565
identity,
6666
:defensive)
67-
# train!(env, (offensive_agent, defensive_agent); callbacks=(stop_at_episode(10^5), gen_records())) # debug
68-
train!(env, (offensive_agent, defensive_agent); callbacks=(stop_at_episode(10^5),))
67+
# train!(env, (offensive_agent, defensive_agent); callbacks=(stop_at_episode(10^6), gen_records())) # debug
68+
train!(env, (offensive_agent, defensive_agent); callbacks=(stop_at_episode(10^6),))
6969
(offensive_agent, defensive_agent)
7070
end
7171

0 commit comments

Comments
 (0)