Skip to content

Commit

Permalink
feat: Updated docs and exampels
Browse files Browse the repository at this point in the history
  • Loading branch information
becktepe committed Jun 4, 2024
1 parent 2502e04 commit 8bfd15b
Show file tree
Hide file tree
Showing 12 changed files with 102 additions and 446 deletions.
18 changes: 13 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ The ARLBench is a benchmark for HPO in RL - evaluate your HPO methods fast and o

- **Lightning-fast JAX-Based implementations of DQN, PPO, and SAC**
- **Compatible with many different environment domains via Gymnax, XLand and EnvPool**
- **Representative benchmark set of HPO settings**
- **Representative benchmark set of HPO settings**

<p align="center">
<a href="./docs/images/subsets.png">
Expand All @@ -46,15 +46,15 @@ The ARLBench is a benchmark for HPO in RL - evaluate your HPO methods fast and o

## Installation

There are currently two different ways to install ARLBench.
There are currently two different ways to install ARLBench.
Whichever you choose, we recommend to create a virtual environment for the installation:

```bash
conda create -n arlbench python=3.10
conda activate arlbench
```

The instructions below will help you install the default version of ARLBench with the CPU version of JAX.
The instructions below will help you install the default version of ARLBench with the CPU version of JAX.
If you want to run the ARLBench on GPU, we recommend you check out the [JAX installation guide](https://jax.readthedocs.io/en/latest/installation.html) to see how you can install the correct version for your GPU setup before proceeding.

<details>
Expand All @@ -66,6 +66,7 @@ pip install arlbench
```

If you want to use envpool environments (not currently supported for Mac!), instead choose:

```bash
pip install arlbench[envpool]
```
Expand All @@ -82,41 +83,48 @@ cd arlbench
```

Then you can install the benchmark. For the base version, use:

```bash
make install
```

For the envpool functionality (not available on Mac!), instead use:

```bash
make install-envpool
```

</details>

> [!CAUTION]
> Windows is currently not supported and also not tested. We recommend using the [Linux subsytem](https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux) if you're on a Windows machine.
## Quickstart

Here are the two ways you can use ARLBench: via the command line or as an environment. To see them in action, take a look at our [examples](https://github.com/automl/arlbench/tree/main/examples).
Here are the two ways you can use ARLBench: via the command line or as an environment. To see them in action, take a look at our [examples](https://github.com/automl/arlbench/tree/main/examples).

### Use the CLI

We provide a command line script for black-box configuration in ARLBench which will also save the results in a 'results' directory. To execute one run of DQN on CartPole, simply run:

```bash
python run_arlbench.py
```

You can use the [hydra](https://hydra.cc/) command line syntax to override some of the configuration like this to change to PPO:

```bash
python run_arlbench.py algorithm=ppo
```

Or run multiple different seeds after one another:

```bash
python run_arlbench.py -m autorl.seed=0,1,2,3,4
```

All hyperparamters to adapt are in the 'hpo_config' and architecture settings in the 'nas_config', so to run a grid of different configurations for 5 seeds each , you can do this:

```bash
python run_arlbench.py -m autorl.seed=0,1,2,3,4 nas_config.hidden_size=8,16,32 hp_config.learning_rate=0.001,0.01
```
Expand Down Expand Up @@ -148,7 +156,7 @@ If you use ARLBench in your work, please cite us:

```bibtex
@misc{beckdierkes24,
author = {J. Becktepe and J. Dierkes and C. Benjamins and D. Salinas and A. Mohan and R. Rajan and T. Eimer and F. Hutter and H. Hoos and M. Lindauer},
author = {J. Becktepe and J. Dierkes and C. Benjamins and D. Salinas and A. Mohan and R. Rajan and F. Hutter and H. Hoos and M. Lindauer and T. Eimer},
title = {ARLBench},
year = {2024},
url = {https://github.com/automl/arlbench},
Expand Down
5 changes: 4 additions & 1 deletion docs/advanced_usage/algorithm_states.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
Using the ARLBench States
==========================

In addition to providing different objectives, ARLBench also provides insights into the target algorithms' internal states.
In addition to providing different objectives, ARLBench also provides insights into the target algorithms' internal states. This is done using so called `StateFeatures`.
As of now, we implement the `GradInfo` state feature which returns the norm the gradients observed during training.

The used state features can be defined using the `state_features` key in the config passed to the AutoRL Environment. Please include `grad_info` in this list if you want to use this state feature for your approach.
20 changes: 19 additions & 1 deletion docs/advanced_usage/autorl_paradigms.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,22 @@
ARLBench and Different AutoRL Paradigms
=======================================

TODO: relationship to other AutoRL paradigms
In this chapter, we elaborate on the relationship between ARLBench in various AutoRL Paradigms.

Hyperparameter Optimization (HPO)
---------------------------------
(Static) Hyperparameter optimization is one of the core use cases of ARLBench. As stated in our examples, ARLBench supports all kinds of black-box optimizers to perform hyperparameter optimization for RL.

Dynamic Algorithm Configuration (DAC)
-------------------------------------
When it comes to dynamic approaches, ARLBench supports different kinds of optimization techniques that adapt the current hyperparameter configuration during training. As stated in the examples,
this can be done using the CLI or the AutoRL Environment. Using checkpointing, trainings can be continued seamlessly which allows for flexible dynamic approaches.

Neural Architecture Search (NAS)
--------------------------------
In addition to HPO, ARLBench supports NAS approaches that set the size of hidden layers and activation functions. However, as of now this is limited to these two architecture hyperparameters.
In the future, ARLBench could be extended by more powerful search space interfaces for NAS.

Meta-Gradients
--------------
As of now, ARLBench does not include meta-gradient based approaches for AutoRL. However, we allow for reactive dynamic approaches that use the gradient informatio during training to select the next hyperparameter configuration as stated in our examples.
9 changes: 8 additions & 1 deletion docs/advanced_usage/dynamic_configuration.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,11 @@
Dynamic Configuration in ARLBench
==================================

How to dynamic?
In addition to static approaches, which run the whole training given a fixed configuration, ARLBench supports dynamic configuration approaches.
These methods, in contrast, can adapt the current hyperparameter configuration during training.
To do this, you can use the CLI or the AutoRL Environment as shown in our examples.

When using the CLI, you have to pass a checkpoint path for the current training state. Then, the training is proceeded using the given configuration.

For the AutoRL Environment, you can set `n_steps` to the number of configuration updates you want to perform during training.
By adjusting the number of training steps (`n_total_timesteps`) accordingly and calling the `step()` function multiple times to perform dynamic configuration.
4 changes: 3 additions & 1 deletion docs/basic_usage/env_subsets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,6 @@ We analyzed the hyperparameter landscapes of PPO, DQN and SAC on 20 environments

We strongly recommend you focus your benchmarking on these exact environments to ensure you cover the space total landscape of RL behaviors well.
The data generated for selecting these environments is available on `HuggingFace <https://huggingface.co/datasets/autorl-org/arlbench>`_ for you to use in your experiments.
For more information how the subset selection was done, please refer to our paper.
For more information how the subset selection was done, please refer to our paper.

For more information on how to evaluate your method on these subsets, please refer to the examples in our GitHub repository.
6 changes: 5 additions & 1 deletion docs/basic_usage/seeding.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
Considerations for Seeding
============================

Seeding is important both on the level of RL algorithms as well as the AutoRL level.
Seeding is important both on the level of RL algorithms as well as the AutoRL level. In general, we propose to use three different random seeds for training, validation, and testing.
For training and validation, ARLBench takes care of the seeding. When you pass a seed to the AutoRL Environment, it uses this seed for training but `seed + 1` for the validation during training.
We recommend to use seeds `0` - `9` for training and validation, i.e., by passing them to the AutoRL Environment for the tuning process.

When it comes to testing HPO methods, we provide a evaluation script in our examples. We propose to use seeds `100, 101, ...` here to make sure the method is tested on a different set of random seeds.
47 changes: 45 additions & 2 deletions examples/Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,9 +186,52 @@ Now we can build a schedule that takes the gradient information into account.

## 4. Evaluation

### PPO
### Evaluation of Static Approaches

You can use ARLBench to evaluate your benchmark method. We recommend running your method on the proposed subset of environments for each algorithm. After that, you need to store the final hyperparameter configurations for the environments and algorithms. This is how the configuration for DQN on Acrobot-v1 might look like:

```yaml
# @package _global_
defaults:
- override /environment: cc_acrobot
- override /algorithm: dqn

hpo_method: my_optimizer

hp_config:
buffer_batch_size: 64
buffer_size: 100000
buffer_prio_sampling: false
initial_epsilon: 0.64
target_epsilon: 0.112
gamma: 0.99
gradient_steps: 1
learning_rate: 0.0023
learning_starts: 1032
use_target_network: true
target_update_interval: 10
```
You should replace `my_optimizer` with the name of your method to make sure the results are stored in the right directory. You can then set your incumbent configuration for the algorithm/environment accordingly.

As soon as you have stored all your incumbents (in this example in the `incumbent` directory in `configs`), you can run the evaluation script:

```bash
python run_arlbench.py --config-name=evaluate -m "autorl.seed=100,101,102" "+incumbent=glob(*)"
python run_arlbench.py --config-name=evaluate -m "autorl.seed=100,101,102" "incumbent=glob(*)"
```

The command will evaluate all configurations on the three test seeds `100,101,102`. Make sure not to use these during the design or tuning of your methods as this will invalidate the evaluation results.

The final evaluation results are stored in the `evaluation` directory for each algorithm and environment.

To run the evaluation only for a single algorithm, e.g. PPO, you can adapt the `incumbent` argument:

```bash
python run_arlbench.py --config-name=evaluate -m "autorl.seed=100,101,102" "incumbent=glob(ppo*)"
```

The same can be done for single combinations of environments and algorithms.

### Evaluation of Dynamic Approaches

When it comes to dynamic HPO methods, you cannot simply return the incumbent but have to evaluate the whole method. For this case, we recommend to use the Hypersweeper or AutoRL Environment as shown in the examples above. Make sure to set the seed of the AutoRL Environment accordingly (`100, 101, 102, ...`).
2 changes: 1 addition & 1 deletion examples/configs/base.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ hydra:
job:
chdir: true

jax_enable_x64: true
jax_enable_x64: false
load_checkpoint: ""

autorl:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,5 @@ hp_config:
target_update_interval: 10
tau: 0.52
reward_scale: 2.32

jax_enable_x64: true
2 changes: 2 additions & 0 deletions examples/configs/incumbent/sac_pendulum_my_optimizer.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,5 @@ hp_config:
target_update_interval: 10
tau: 0.52
reward_scale: 2.32

jax_enable_x64: true
Loading

0 comments on commit 8bfd15b

Please sign in to comment.