From 1ada2ce8092ca3caef4d93374dc542200087763b Mon Sep 17 00:00:00 2001 From: Michael Abbott <32575566+mcabbott@users.noreply.github.com> Date: Sat, 29 Apr 2023 14:17:08 -0400 Subject: [PATCH 1/3] say opt_state in README --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index fe8fb9f8..38cdf1ab 100644 --- a/README.md +++ b/README.md @@ -38,15 +38,15 @@ It is initialised by `setup`, and then at each step, `update` returns both the n state, and the model with its trainable parameters adjusted: ```julia -state = Optimisers.setup(Optimisers.Adam(), model) # just once +opt_state = Optimisers.setup(Optimisers.Adam(), model) # just once grad = Zygote.gradient(m -> loss(m(x), y), model)[1] -state, model = Optimisers.update(state, model, grad) # at every step +opt_state, model = Optimisers.update(opt_state, model, grad) # at every step ``` For models with deeply nested layers containing the parameters (like [Flux.jl](https://github.com/FluxML/Flux.jl) models), -this state is a similarly nested tree. As is the gradient: if using Zygote, you must use the "explicit" style as shown, +this `opt_state` is a similarly nested tree. As is the gradient: if using Zygote, you must use the "explicit" style as shown, not the "implicit" one with `Params`. The function `destructure` collects all the trainable parameters into one vector, From 4bdc571809f0b292cc0f8721fd6e08cefacffe79 Mon Sep 17 00:00:00 2001 From: Michael Abbott <32575566+mcabbott@users.noreply.github.com> Date: Sat, 29 Apr 2023 14:20:17 -0400 Subject: [PATCH 2/3] change to state_tree instead --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 38cdf1ab..4288b706 100644 --- a/README.md +++ b/README.md @@ -38,15 +38,15 @@ It is initialised by `setup`, and then at each step, `update` returns both the n state, and the model with its trainable parameters adjusted: ```julia -opt_state = Optimisers.setup(Optimisers.Adam(), model) # just once +state_tree = Optimisers.setup(Optimisers.Adam(), model) # just once grad = Zygote.gradient(m -> loss(m(x), y), model)[1] -opt_state, model = Optimisers.update(opt_state, model, grad) # at every step +state_tree, model = Optimisers.update(opt_state, model, grad) # at every step ``` For models with deeply nested layers containing the parameters (like [Flux.jl](https://github.com/FluxML/Flux.jl) models), -this `opt_state` is a similarly nested tree. As is the gradient: if using Zygote, you must use the "explicit" style as shown, +this `state_tree` is a similarly nested object. As is the gradient: if using Zygote, you must use the "explicit" style as shown, not the "implicit" one with `Params`. The function `destructure` collects all the trainable parameters into one vector, From 2cbe0a251886edc07a922e441f996ca8623475c8 Mon Sep 17 00:00:00 2001 From: Michael Abbott <32575566+mcabbott@users.noreply.github.com> Date: Sat, 29 Apr 2023 14:23:40 -0400 Subject: [PATCH 3/3] also mention adjust --- README.md | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 4288b706..7a8645a7 100644 --- a/README.md +++ b/README.md @@ -21,8 +21,8 @@ Optimisers.jl defines many standard gradient-based optimisation rules, and tools for applying them to deeply nested models. -This is the future of training for [Flux.jl](https://github.com/FluxML/Flux.jl) neural networks, -and the present for [Lux.jl](https://github.com/avik-pal/Lux.jl). +This was written as a new training back-end for [Flux.jl](https://github.com/FluxML/Flux.jl) neural networks, +and is also used by [Lux.jl](https://github.com/avik-pal/Lux.jl). But it can be used separately on any array, or anything else understood by [Functors.jl](https://github.com/FluxML/Functors.jl). ## Installation @@ -49,6 +49,12 @@ For models with deeply nested layers containing the parameters (like [Flux.jl](h this `state_tree` is a similarly nested object. As is the gradient: if using Zygote, you must use the "explicit" style as shown, not the "implicit" one with `Params`. +You can change the learning rate during training by mutating all the states: + +```julia +Optimisers.adjust!(state_tree, 0.01) +``` + The function `destructure` collects all the trainable parameters into one vector, and returns this along with a function to re-build a similar model: