Skip to content

Commit 5309d00

Browse files
Merge branch 'main' into feat/runname_display
2 parents e9f0c67 + c656748 commit 5309d00

39 files changed

+1657
-1173
lines changed

.flake8

Lines changed: 0 additions & 22 deletions
This file was deleted.

.pre-commit-config.yaml

Lines changed: 5 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,21 @@
11
repos:
2-
- repo: https://github.com/python/black
3-
rev: 23.10.1
2+
- repo: https://github.com/astral-sh/ruff-pre-commit
3+
rev: v0.14.0
44
hooks:
5-
- id: black
6-
args: ["--line-length", "120", "--preview"]
7-
- repo: https://github.com/pycqa/flake8
8-
rev: 6.1.0
9-
hooks:
10-
- id: flake8
11-
additional_dependencies: [flake8-simplify, flake8-return]
5+
- id: ruff-check
6+
- id: ruff-format
127
- repo: https://github.com/pre-commit/pre-commit-hooks
138
rev: v4.5.0
149
hooks:
15-
- id: trailing-whitespace
1610
- id: check-symlinks
1711
- id: destroyed-symlinks
1812
- id: check-yaml
13+
- id: check-toml
1914
- id: check-merge-conflict
2015
- id: check-case-conflict
2116
- id: check-executables-have-shebangs
22-
- id: check-toml
23-
- id: end-of-file-fixer
2417
- id: check-shebang-scripts-are-executable
2518
- id: detect-private-key
26-
- id: debug-statements
27-
- repo: https://github.com/pycqa/isort
28-
rev: 5.12.0
29-
hooks:
30-
- id: isort
31-
name: isort (python)
32-
args: ["--profile", "black", "--filter-files"]
33-
- repo: https://github.com/asottile/pyupgrade
34-
rev: v3.15.0
35-
hooks:
36-
- id: pyupgrade
37-
args: ["--py37-plus"]
3819
- repo: https://github.com/codespell-project/codespell
3920
rev: v2.2.6
4021
hooks:

CITATION.cff

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
cff-version: 1.2.0
2+
title: "RSL-RL: A Learning Library for Robotics Research"
3+
message: "If you use this work, please cite the following paper."
4+
repository-code: "https://github.com/leggedrobotics/rsl_rl"
5+
license: BSD-3-Clause
6+
version: 3.1.3
7+
type: software
8+
authors:
9+
- family-names: Schwarke
10+
given-names: Clemens
11+
- family-names: Mittal
12+
given-names: Mayank
13+
- family-names: Rudin
14+
given-names: Nikita
15+
- family-names: Hoeller
16+
given-names: David
17+
keywords:
18+
- reinforcement learning
19+
- robotics
20+
- control
21+
- RSL-RL
22+
preferred-citation:
23+
type: article
24+
authors:
25+
- family-names: Schwarke
26+
given-names: Clemens
27+
- family-names: Mittal
28+
given-names: Mayank
29+
- family-names: Rudin
30+
given-names: Nikita
31+
- family-names: Hoeller
32+
given-names: David
33+
- family-names: Hutter
34+
given-names: Marco
35+
title: "RSL-RL: A Learning Library for Robotics Research"
36+
journal: "arXiv preprint"
37+
doi: 10.48550/arXiv.2509.10771
38+
url: "https://arxiv.org/abs/2509.10771"

CONTRIBUTORS.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,12 +17,14 @@ Please keep the lists sorted alphabetically.
1717

1818
---
1919

20-
* Mayank Mittal
2120
* Clemens Schwarke
21+
* Mayank Mittal
2222

2323
## Authors
2424

25+
* Clemens Schwarke
2526
* David Hoeller
27+
* Mayank Mittal
2628
* Nikita Rudin
2729

2830
## Contributors

README.md

Lines changed: 13 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,15 @@
1-
# RSL RL
1+
# RSL-RL
22

3-
A fast and simple implementation of RL algorithms, designed to run fully on GPU.
4-
This code is an evolution of `rl-pytorch` provided with NVIDIA's Isaac Gym.
3+
A fast and simple implementation of learning algorithms for robotics. For an overview of the library please have a look at https://arxiv.org/pdf/2509.10771.
54

65
Environment repositories using the framework:
76

87
* **`Isaac Lab`** (built on top of NVIDIA Isaac Sim): https://github.com/isaac-sim/IsaacLab
9-
* **`Legged-Gym`** (built on top of NVIDIA Isaac Gym): https://leggedrobotics.github.io/legged_gym/
8+
* **`Legged Gym`** (built on top of NVIDIA Isaac Gym): https://leggedrobotics.github.io/legged_gym/
9+
* **`MuJoCo Playground`** (built on top of MuJoCo MJX and Warp): https://github.com/google-deepmind/mujoco_playground/
10+
* **`mjlab`** (built on top of MuJoCo Warp): https://github.com/mujocolab/mjlab
1011

11-
The main branch supports **PPO** and **Student-Teacher Distillation** with additional features from our research. These include:
12+
The library currently supports **PPO** and **Student-Teacher Distillation** with additional features from our research. These include:
1213

1314
* [Random Network Distillation (RND)](https://proceedings.mlr.press/v229/schwarke23a.html) - Encourages exploration by adding
1415
a curiosity driven intrinsic reward.
@@ -21,8 +22,6 @@ information.
2122
**Affiliation**: Robotic Systems Lab, ETH Zurich & NVIDIA <br/>
2223
**Contact**: cschwarke@ethz.ch
2324

24-
> **Note:** The `algorithms` branch supports additional algorithms (SAC, DDPG, DSAC, and more). However, it isn't currently actively maintained.
25-
2625

2726
## Setup
2827

@@ -56,8 +55,7 @@ For documentation, we adopt the [Google Style Guide](https://sphinxcontrib-napol
5655
We use the following tools for maintaining code quality:
5756

5857
- [pre-commit](https://pre-commit.com/): Runs a list of formatters and linters over the codebase.
59-
- [black](https://black.readthedocs.io/en/stable/): The uncompromising code formatter.
60-
- [flake8](https://flake8.pycqa.org/en/latest/): A wrapper around PyFlakes, pycodestyle, and McCabe complexity checker.
58+
- [ruff](https://github.com/astral-sh/ruff): An extremely fast Python linter and code formatter, written in Rust.
6159

6260
Please check [here](https://pre-commit.com/#install) for instructions to set these up. To run over the entire repository, please execute the following command in the terminal:
6361

@@ -70,20 +68,14 @@ pre-commit run --all-files
7068

7169
## Citing
7270

73-
**We are working on writing a white paper for this library.** Until then, please cite the following work
74-
if you use this library for your research:
71+
If you use this library for your research, please cite the following work:
7572

7673
```text
77-
@InProceedings{rudin2022learning,
78-
title = {Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning},
79-
author = {Rudin, Nikita and Hoeller, David and Reist, Philipp and Hutter, Marco},
80-
booktitle = {Proceedings of the 5th Conference on Robot Learning},
81-
pages = {91--100},
82-
year = {2022},
83-
volume = {164},
84-
series = {Proceedings of Machine Learning Research},
85-
publisher = {PMLR},
86-
url = {https://proceedings.mlr.press/v164/rudin22a.html},
74+
@article{schwarke2025rslrl,
75+
title={RSL-RL: A Learning Library for Robotics Research},
76+
author={Schwarke, Clemens and Mittal, Mayank and Rudin, Nikita and Hoeller, David and Hutter, Marco},
77+
journal={arXiv preprint arXiv:2509.10771},
78+
year={2025}
8779
}
8880
```
8981

config/example_config.yaml

Lines changed: 30 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,21 @@
11
runner:
22
class_name: OnPolicyRunner
3-
# -- general
4-
num_steps_per_env: 24 # number of steps per environment per iteration
5-
max_iterations: 1500 # number of policy updates
3+
# General
4+
num_steps_per_env: 24 # Number of steps per environment per iteration
5+
max_iterations: 1500 # Number of policy updates
66
seed: 1
7-
# -- observations
8-
obs_groups: {"policy": ["policy"], "critic": ["policy", "privileged"]} # maps observation groups to types. See `vec_env.py` for more information
9-
# -- logging parameters
10-
save_interval: 50 # check for potential saves every `save_interval` iterations
7+
# Observations
8+
obs_groups: {"policy": ["policy"], "critic": ["policy", "privileged"]} # Maps observation groups to sets. See `vec_env.py` for more information
9+
# Logging parameters
10+
save_interval: 50 # Check for potential saves every `save_interval` iterations
1111
experiment_name: walking_experiment
1212
run_name: ""
13-
# -- logging writer
13+
# Logging writer
1414
logger: tensorboard # tensorboard, neptune, wandb
1515
neptune_project: legged_gym
1616
wandb_project: legged_gym
1717

18-
# -- policy
18+
# Policy
1919
policy:
2020
class_name: ActorCritic
2121
activation: elu
@@ -25,45 +25,46 @@ runner:
2525
critic_hidden_dims: [256, 256, 256]
2626
init_noise_std: 1.0
2727
noise_std_type: "scalar" # 'scalar' or 'log'
28+
state_dependent_std: false
2829

29-
# -- algorithm
30+
# Algorithm
3031
algorithm:
3132
class_name: PPO
32-
# -- training
33+
# Training
3334
learning_rate: 0.001
3435
num_learning_epochs: 5
3536
num_mini_batches: 4 # mini batch size = num_envs * num_steps / num_mini_batches
3637
schedule: adaptive # adaptive, fixed
37-
# -- value function
38+
# Value function
3839
value_loss_coef: 1.0
3940
clip_param: 0.2
4041
use_clipped_value_loss: true
41-
# -- surrogate loss
42+
# Surrogate loss
4243
desired_kl: 0.01
4344
entropy_coef: 0.01
4445
gamma: 0.99
4546
lam: 0.95
4647
max_grad_norm: 1.0
47-
# -- miscellaneous
48+
# Miscellaneous
4849
normalize_advantage_per_mini_batch: false
4950

50-
# -- random network distillation
51+
# Random network distillation
5152
rnd_cfg:
52-
weight: 0.0 # initial weight of the RND reward
53-
weight_schedule: null # note: this is a dictionary with a required key called "mode". Please check the RND module for more information
54-
reward_normalization: false # whether to normalize RND reward
55-
# -- learning parameters
56-
learning_rate: 0.001 # learning rate for RND
57-
# -- network parameters
58-
num_outputs: 1 # number of outputs of RND network. Note: if -1, then the network will use dimensions of the observation
59-
predictor_hidden_dims: [-1] # hidden dimensions of predictor network
60-
target_hidden_dims: [-1] # hidden dimensions of target network
53+
weight: 0.0 # Initial weight of the RND reward
54+
weight_schedule: null # This is a dictionary with a required key called "mode". Please check the RND module for more information
55+
reward_normalization: false # Whether to normalize RND reward
56+
# Learning parameters
57+
learning_rate: 0.001 # Learning rate for RND
58+
# Network parameters
59+
num_outputs: 1 # Number of outputs of RND network. Note: if -1, then the network will use dimensions of the observation
60+
predictor_hidden_dims: [-1] # Hidden dimensions of predictor network
61+
target_hidden_dims: [-1] # Hidden dimensions of target network
6162

62-
# -- symmetry augmentation
63+
# Symmetry augmentation
6364
symmetry_cfg:
64-
use_data_augmentation: true # this adds symmetric trajectories to the batch
65-
use_mirror_loss: false # this adds symmetry loss term to the loss function
66-
data_augmentation_func: null # string containing the module and function name to import
65+
use_data_augmentation: true # This adds symmetric trajectories to the batch
66+
use_mirror_loss: false # This adds symmetry loss term to the loss function
67+
data_augmentation_func: null # String containing the module and function name to import
6768
# Example: "legged_gym.envs.locomotion.anymal_c.symmetry:get_symmetric_states"
6869
#
6970
# .. code-block:: python
@@ -73,4 +74,4 @@ runner:
7374
# obs: Optional[torch.Tensor] = None, actions: Optional[torch.Tensor] = None, cfg: "BaseEnvCfg" = None, obs_type: str = "policy"
7475
# ) -> Tuple[torch.Tensor, torch.Tensor]:
7576
#
76-
mirror_loss_coeff: 0.0 #coefficient for symmetry loss term. If 0, no symmetry loss is used
77+
mirror_loss_coeff: 0.0 # Coefficient for symmetry loss term. If 0, no symmetry loss is used

licenses/dependencies/flake8-license.txt

Lines changed: 0 additions & 22 deletions
This file was deleted.

licenses/dependencies/isort-license.txt

Lines changed: 0 additions & 21 deletions
This file was deleted.

licenses/dependencies/black-license.txt renamed to licenses/dependencies/onnxscript-license.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
The MIT License (MIT)
1+
MIT License
22

3-
Copyright (c) 2018 Łukasz Langa
3+
Copyright (c) Microsoft Corporation
44

55
Permission is hereby granted, free of charge, to any person obtaining a copy
66
of this software and associated documentation files (the "Software"), to deal

0 commit comments

Comments
 (0)