diff --git a/README.md b/README.md index 4cdf90f73..f68c4d3de 100644 --- a/README.md +++ b/README.md @@ -36,7 +36,7 @@ The ARLBench is a benchmark for HPO in RL - evaluate your HPO methods fast and o - **Lightning-fast JAX-Based implementations of DQN, PPO, and SAC** - **Compatible with many different environment domains via Gymnax, XLand and EnvPool** -- **Representative benchmark set of HPO settings** +- **Representative benchmark set of HPO settings**

@@ -46,7 +46,7 @@ The ARLBench is a benchmark for HPO in RL - evaluate your HPO methods fast and o ## Installation -There are currently two different ways to install ARLBench. +There are currently two different ways to install ARLBench. Whichever you choose, we recommend to create a virtual environment for the installation: ```bash @@ -54,7 +54,7 @@ conda create -n arlbench python=3.10 conda activate arlbench ``` -The instructions below will help you install the default version of ARLBench with the CPU version of JAX. +The instructions below will help you install the default version of ARLBench with the CPU version of JAX. If you want to run the ARLBench on GPU, we recommend you check out the [JAX installation guide](https://jax.readthedocs.io/en/latest/installation.html) to see how you can install the correct version for your GPU setup before proceeding.

@@ -66,6 +66,7 @@ pip install arlbench ``` If you want to use envpool environments (not currently supported for Mac!), instead choose: + ```bash pip install arlbench[envpool] ``` @@ -82,14 +83,17 @@ cd arlbench ``` Then you can install the benchmark. For the base version, use: + ```bash make install ``` + For the envpool functionality (not available on Mac!), instead use: ```bash make install-envpool ``` +
> [!CAUTION] @@ -97,26 +101,30 @@ make install-envpool ## Quickstart -Here are the two ways you can use ARLBench: via the command line or as an environment. To see them in action, take a look at our [examples](https://github.com/automl/arlbench/tree/main/examples). +Here are the two ways you can use ARLBench: via the command line or as an environment. To see them in action, take a look at our [examples](https://github.com/automl/arlbench/tree/main/examples). ### Use the CLI We provide a command line script for black-box configuration in ARLBench which will also save the results in a 'results' directory. To execute one run of DQN on CartPole, simply run: + ```bash python run_arlbench.py ``` You can use the [hydra](https://hydra.cc/) command line syntax to override some of the configuration like this to change to PPO: + ```bash python run_arlbench.py algorithm=ppo ``` Or run multiple different seeds after one another: + ```bash python run_arlbench.py -m autorl.seed=0,1,2,3,4 ``` All hyperparamters to adapt are in the 'hpo_config' and architecture settings in the 'nas_config', so to run a grid of different configurations for 5 seeds each , you can do this: + ```bash python run_arlbench.py -m autorl.seed=0,1,2,3,4 nas_config.hidden_size=8,16,32 hp_config.learning_rate=0.001,0.01 ``` @@ -148,7 +156,7 @@ If you use ARLBench in your work, please cite us: ```bibtex @misc{beckdierkes24, - author = {J. Becktepe and J. Dierkes and C. Benjamins and D. Salinas and A. Mohan and R. Rajan and T. Eimer and F. Hutter and H. Hoos and M. Lindauer}, + author = {J. Becktepe and J. Dierkes and C. Benjamins and D. Salinas and A. Mohan and R. Rajan and F. Hutter and H. Hoos and M. Lindauer and T. Eimer}, title = {ARLBench}, year = {2024}, url = {https://github.com/automl/arlbench}, diff --git a/docs/advanced_usage/algorithm_states.rst b/docs/advanced_usage/algorithm_states.rst index 92591589d..79e280389 100644 --- a/docs/advanced_usage/algorithm_states.rst +++ b/docs/advanced_usage/algorithm_states.rst @@ -1,4 +1,7 @@ Using the ARLBench States ========================== -In addition to providing different objectives, ARLBench also provides insights into the target algorithms' internal states. \ No newline at end of file +In addition to providing different objectives, ARLBench also provides insights into the target algorithms' internal states. This is done using so called `StateFeatures`. +As of now, we implement the `GradInfo` state feature which returns the norm the gradients observed during training. + +The used state features can be defined using the `state_features` key in the config passed to the AutoRL Environment. Please include `grad_info` in this list if you want to use this state feature for your approach. \ No newline at end of file diff --git a/docs/advanced_usage/autorl_paradigms.rst b/docs/advanced_usage/autorl_paradigms.rst index 9c8a29ef1..c5214b964 100644 --- a/docs/advanced_usage/autorl_paradigms.rst +++ b/docs/advanced_usage/autorl_paradigms.rst @@ -1,4 +1,22 @@ ARLBench and Different AutoRL Paradigms ======================================= -TODO: relationship to other AutoRL paradigms \ No newline at end of file +In this chapter, we elaborate on the relationship between ARLBench in various AutoRL Paradigms. + +Hyperparameter Optimization (HPO) +--------------------------------- +(Static) Hyperparameter optimization is one of the core use cases of ARLBench. As stated in our examples, ARLBench supports all kinds of black-box optimizers to perform hyperparameter optimization for RL. + +Dynamic Algorithm Configuration (DAC) +------------------------------------- +When it comes to dynamic approaches, ARLBench supports different kinds of optimization techniques that adapt the current hyperparameter configuration during training. As stated in the examples, +this can be done using the CLI or the AutoRL Environment. Using checkpointing, trainings can be continued seamlessly which allows for flexible dynamic approaches. + +Neural Architecture Search (NAS) +-------------------------------- +In addition to HPO, ARLBench supports NAS approaches that set the size of hidden layers and activation functions. However, as of now this is limited to these two architecture hyperparameters. +In the future, ARLBench could be extended by more powerful search space interfaces for NAS. + +Meta-Gradients +-------------- +As of now, ARLBench does not include meta-gradient based approaches for AutoRL. However, we allow for reactive dynamic approaches that use the gradient informatio during training to select the next hyperparameter configuration as stated in our examples. \ No newline at end of file diff --git a/docs/advanced_usage/dynamic_configuration.rst b/docs/advanced_usage/dynamic_configuration.rst index 5d6cde095..9e370975b 100644 --- a/docs/advanced_usage/dynamic_configuration.rst +++ b/docs/advanced_usage/dynamic_configuration.rst @@ -1,4 +1,11 @@ Dynamic Configuration in ARLBench ================================== -How to dynamic? \ No newline at end of file +In addition to static approaches, which run the whole training given a fixed configuration, ARLBench supports dynamic configuration approaches. +These methods, in contrast, can adapt the current hyperparameter configuration during training. +To do this, you can use the CLI or the AutoRL Environment as shown in our examples. + +When using the CLI, you have to pass a checkpoint path for the current training state. Then, the training is proceeded using the given configuration. + +For the AutoRL Environment, you can set `n_steps` to the number of configuration updates you want to perform during training. +By adjusting the number of training steps (`n_total_timesteps`) accordingly and calling the `step()` function multiple times to perform dynamic configuration. diff --git a/docs/basic_usage/env_subsets.rst b/docs/basic_usage/env_subsets.rst index 9aaa4f5da..a22628d99 100644 --- a/docs/basic_usage/env_subsets.rst +++ b/docs/basic_usage/env_subsets.rst @@ -9,4 +9,6 @@ We analyzed the hyperparameter landscapes of PPO, DQN and SAC on 20 environments We strongly recommend you focus your benchmarking on these exact environments to ensure you cover the space total landscape of RL behaviors well. The data generated for selecting these environments is available on `HuggingFace `_ for you to use in your experiments. -For more information how the subset selection was done, please refer to our paper. \ No newline at end of file +For more information how the subset selection was done, please refer to our paper. + +For more information on how to evaluate your method on these subsets, please refer to the examples in our GitHub repository. \ No newline at end of file diff --git a/docs/basic_usage/seeding.rst b/docs/basic_usage/seeding.rst index 983eafe0c..4f9a86ee6 100644 --- a/docs/basic_usage/seeding.rst +++ b/docs/basic_usage/seeding.rst @@ -1,4 +1,8 @@ Considerations for Seeding ============================ -Seeding is important both on the level of RL algorithms as well as the AutoRL level. \ No newline at end of file +Seeding is important both on the level of RL algorithms as well as the AutoRL level. In general, we propose to use three different random seeds for training, validation, and testing. +For training and validation, ARLBench takes care of the seeding. When you pass a seed to the AutoRL Environment, it uses this seed for training but `seed + 1` for the validation during training. +We recommend to use seeds `0` - `9` for training and validation, i.e., by passing them to the AutoRL Environment for the tuning process. + +When it comes to testing HPO methods, we provide a evaluation script in our examples. We propose to use seeds `100, 101, ...` here to make sure the method is tested on a different set of random seeds. \ No newline at end of file diff --git a/examples/Readme.md b/examples/Readme.md index 89932e155..7d81851b5 100644 --- a/examples/Readme.md +++ b/examples/Readme.md @@ -186,9 +186,52 @@ Now we can build a schedule that takes the gradient information into account. ## 4. Evaluation -### PPO +### Evaluation of Static Approaches + +You can use ARLBench to evaluate your benchmark method. We recommend running your method on the proposed subset of environments for each algorithm. After that, you need to store the final hyperparameter configurations for the environments and algorithms. This is how the configuration for DQN on Acrobot-v1 might look like: + +```yaml +# @package _global_ +defaults: + - override /environment: cc_acrobot + - override /algorithm: dqn + +hpo_method: my_optimizer + +hp_config: + buffer_batch_size: 64 + buffer_size: 100000 + buffer_prio_sampling: false + initial_epsilon: 0.64 + target_epsilon: 0.112 + gamma: 0.99 + gradient_steps: 1 + learning_rate: 0.0023 + learning_starts: 1032 + use_target_network: true + target_update_interval: 10 +``` + +You should replace `my_optimizer` with the name of your method to make sure the results are stored in the right directory. You can then set your incumbent configuration for the algorithm/environment accordingly. + +As soon as you have stored all your incumbents (in this example in the `incumbent` directory in `configs`), you can run the evaluation script: ```bash -python run_arlbench.py --config-name=evaluate -m "autorl.seed=100,101,102" "+incumbent=glob(*)" +python run_arlbench.py --config-name=evaluate -m "autorl.seed=100,101,102" "incumbent=glob(*)" +``` + +The command will evaluate all configurations on the three test seeds `100,101,102`. Make sure not to use these during the design or tuning of your methods as this will invalidate the evaluation results. + +The final evaluation results are stored in the `evaluation` directory for each algorithm and environment. + +To run the evaluation only for a single algorithm, e.g. PPO, you can adapt the `incumbent` argument: +```bash +python run_arlbench.py --config-name=evaluate -m "autorl.seed=100,101,102" "incumbent=glob(ppo*)" ``` + +The same can be done for single combinations of environments and algorithms. + +### Evaluation of Dynamic Approaches + +When it comes to dynamic HPO methods, you cannot simply return the incumbent but have to evaluate the whole method. For this case, we recommend to use the Hypersweeper or AutoRL Environment as shown in the examples above. Make sure to set the seed of the AutoRL Environment accordingly (`100, 101, 102, ...`). diff --git a/examples/configs/base.yaml b/examples/configs/base.yaml index b32c87fe3..83d389392 100644 --- a/examples/configs/base.yaml +++ b/examples/configs/base.yaml @@ -12,7 +12,7 @@ hydra: job: chdir: true -jax_enable_x64: true +jax_enable_x64: false load_checkpoint: "" autorl: diff --git a/examples/configs/incumbent/sac_continuous_mountain_car_my_optimizer.yaml b/examples/configs/incumbent/sac_continuous_mountain_car_my_optimizer.yaml index 9e0bc8075..6613f7ebf 100644 --- a/examples/configs/incumbent/sac_continuous_mountain_car_my_optimizer.yaml +++ b/examples/configs/incumbent/sac_continuous_mountain_car_my_optimizer.yaml @@ -16,3 +16,5 @@ hp_config: target_update_interval: 10 tau: 0.52 reward_scale: 2.32 + +jax_enable_x64: true diff --git a/examples/configs/incumbent/sac_pendulum_my_optimizer.yaml b/examples/configs/incumbent/sac_pendulum_my_optimizer.yaml index 2a612feb1..0a9041836 100644 --- a/examples/configs/incumbent/sac_pendulum_my_optimizer.yaml +++ b/examples/configs/incumbent/sac_pendulum_my_optimizer.yaml @@ -16,3 +16,5 @@ hp_config: target_update_interval: 10 tau: 0.52 reward_scale: 2.32 + +jax_enable_x64: true diff --git a/examples/evaluation/default_method/dqn_CartPole-v1/100/multirun.yaml b/examples/evaluation/default_method/dqn_CartPole-v1/100/multirun.yaml deleted file mode 100644 index a1d876e54..000000000 --- a/examples/evaluation/default_method/dqn_CartPole-v1/100/multirun.yaml +++ /dev/null @@ -1,216 +0,0 @@ -hydra: - run: - dir: evaluation/${hpo_method}/${algorithm}_${environment.name}/${autorl.seed} - sweep: - dir: evaluation/${hpo_method}/${algorithm}_${environment.name}/${autorl.seed} - subdir: ${hydra.job.num} - launcher: - _target_: hydra._internal.core_plugins.basic_launcher.BasicLauncher - sweeper: - _target_: hydra._internal.core_plugins.basic_sweeper.BasicSweeper - max_batch_size: null - params: null - help: - app_name: ${hydra.job.name} - header: '${hydra.help.app_name} is powered by Hydra. - - ' - footer: 'Powered by Hydra (https://hydra.cc) - - Use --hydra-help to view Hydra specific help - - ' - template: '${hydra.help.header} - - == Configuration groups == - - Compose your configuration from those groups (group=option) - - - $APP_CONFIG_GROUPS - - - == Config == - - Override anything in the config (foo.bar=value) - - - $CONFIG - - - ${hydra.help.footer} - - ' - hydra_help: - template: 'Hydra (${hydra.runtime.version}) - - See https://hydra.cc for more info. - - - == Flags == - - $FLAGS_HELP - - - == Configuration groups == - - Compose your configuration from those groups (For example, append hydra/job_logging=disabled - to command line) - - - $HYDRA_CONFIG_GROUPS - - - Use ''--cfg hydra'' to Show the Hydra config. - - ' - hydra_help: ??? - hydra_logging: - version: 1 - formatters: - simple: - format: '[%(asctime)s][HYDRA] %(message)s' - handlers: - console: - class: logging.StreamHandler - formatter: simple - stream: ext://sys.stdout - root: - level: INFO - handlers: - - console - loggers: - logging_example: - level: DEBUG - disable_existing_loggers: false - job_logging: - version: 1 - formatters: - simple: - format: '[%(asctime)s][%(name)s][%(levelname)s] - %(message)s' - handlers: - console: - class: logging.StreamHandler - formatter: simple - stream: ext://sys.stdout - file: - class: logging.FileHandler - formatter: simple - filename: ${hydra.runtime.output_dir}/${hydra.job.name}.log - root: - level: INFO - handlers: - - console - - file - disable_existing_loggers: false - env: {} - mode: MULTIRUN - searchpath: [] - callbacks: {} - output_subdir: .hydra - overrides: - hydra: - - hydra.mode=MULTIRUN - task: - - autorl.seed=100 - - +incumbent=glob(*) - job: - name: run_arlbench - chdir: true - override_dirname: +incumbent=glob(*),autorl.seed=100 - id: ??? - num: ??? - config_name: evaluate - env_set: {} - env_copy: [] - config: - override_dirname: - kv_sep: '=' - item_sep: ',' - exclude_keys: [] - runtime: - version: 1.3.2 - version_base: '1.3' - cwd: /Users/jannisbecktepe/Developer/arlbench_main/examples - config_sources: - - path: hydra.conf - schema: pkg - provider: hydra - - path: /Users/jannisbecktepe/Developer/arlbench_main/examples/configs - schema: file - provider: main - - path: hydra_plugins.hydra_colorlog.conf - schema: pkg - provider: hydra-colorlog - - path: '' - schema: structured - provider: schema - output_dir: ??? - choices: - environment: cc_cartpole - algorithm: dqn - hydra/env: default - hydra/callbacks: null - hydra/job_logging: default - hydra/hydra_logging: default - hydra/hydra_help: default - hydra/help: default - hydra/sweeper: basic - hydra/launcher: basic - hydra/output: default - verbose: false -jax_enable_x64: true -load_checkpoint: '' -hpo_method: default_method -autorl: - seed: 100 - env_framework: ${environment.framework} - env_name: ${environment.name} - env_kwargs: ${environment.kwargs} - eval_env_kwargs: ${environment.eval_kwargs} - n_envs: ${environment.n_envs} - algorithm: ${algorithm} - cnn_policy: ${environment.cnn_policy} - nas_config: ${nas_config} - n_total_timesteps: ${environment.n_total_timesteps} - checkpoint: [] - checkpoint_name: default_checkpoint - checkpoint_dir: /tmp - state_features: [] - objectives: - - reward_mean - optimize_objectives: upper - n_steps: 1 - n_eval_steps: 100 - n_eval_episodes: 10 -algorithm: dqn -hp_config: - buffer_prio_sampling: false - buffer_alpha: 0.9 - buffer_beta: 0.9 - buffer_epsilon: 0.001 - buffer_batch_size: 16 - buffer_size: 1000000 - initial_epsilon: 1.0 - target_epsilon: 0.05 - gamma: 0.99 - gradient_steps: 1 - learning_rate: 0.0003 - learning_starts: 128 - normalize_observations: false - train_freq: 4 - use_target_network: true - target_update_interval: 1000 - tau: 1.0 -nas_config: - activation: tanh - hidden_size: 64 -environment: - name: CartPole-v1 - framework: gymnax - n_total_timesteps: 100000.0 - kwargs: {} - eval_kwargs: {} - cnn_policy: false - deterministic_eval: true - n_envs: 8 diff --git a/examples/evaluation/my_optimizer/dqn_Acrobot-v1/100/multirun.yaml b/examples/evaluation/my_optimizer/dqn_Acrobot-v1/100/multirun.yaml deleted file mode 100644 index ad9b68413..000000000 --- a/examples/evaluation/my_optimizer/dqn_Acrobot-v1/100/multirun.yaml +++ /dev/null @@ -1,217 +0,0 @@ -hydra: - run: - dir: evaluation/${hpo_method}/${algorithm}_${environment.name}/${autorl.seed} - sweep: - dir: evaluation/${hpo_method}/${algorithm}_${environment.name}/${autorl.seed} - subdir: ${hydra.job.num} - launcher: - _target_: hydra._internal.core_plugins.basic_launcher.BasicLauncher - sweeper: - _target_: hydra._internal.core_plugins.basic_sweeper.BasicSweeper - max_batch_size: null - params: null - help: - app_name: ${hydra.job.name} - header: '${hydra.help.app_name} is powered by Hydra. - - ' - footer: 'Powered by Hydra (https://hydra.cc) - - Use --hydra-help to view Hydra specific help - - ' - template: '${hydra.help.header} - - == Configuration groups == - - Compose your configuration from those groups (group=option) - - - $APP_CONFIG_GROUPS - - - == Config == - - Override anything in the config (foo.bar=value) - - - $CONFIG - - - ${hydra.help.footer} - - ' - hydra_help: - template: 'Hydra (${hydra.runtime.version}) - - See https://hydra.cc for more info. - - - == Flags == - - $FLAGS_HELP - - - == Configuration groups == - - Compose your configuration from those groups (For example, append hydra/job_logging=disabled - to command line) - - - $HYDRA_CONFIG_GROUPS - - - Use ''--cfg hydra'' to Show the Hydra config. - - ' - hydra_help: ??? - hydra_logging: - version: 1 - formatters: - simple: - format: '[%(asctime)s][HYDRA] %(message)s' - handlers: - console: - class: logging.StreamHandler - formatter: simple - stream: ext://sys.stdout - root: - level: INFO - handlers: - - console - loggers: - logging_example: - level: DEBUG - disable_existing_loggers: false - job_logging: - version: 1 - formatters: - simple: - format: '[%(asctime)s][%(name)s][%(levelname)s] - %(message)s' - handlers: - console: - class: logging.StreamHandler - formatter: simple - stream: ext://sys.stdout - file: - class: logging.FileHandler - formatter: simple - filename: ${hydra.runtime.output_dir}/${hydra.job.name}.log - root: - level: INFO - handlers: - - console - - file - disable_existing_loggers: false - env: {} - mode: MULTIRUN - searchpath: [] - callbacks: {} - output_subdir: .hydra - overrides: - hydra: - - hydra.mode=MULTIRUN - task: - - autorl.seed=100 - - incumbent=glob(*) - job: - name: run_arlbench - chdir: true - override_dirname: autorl.seed=100,incumbent=glob(*) - id: ??? - num: ??? - config_name: evaluate - env_set: {} - env_copy: [] - config: - override_dirname: - kv_sep: '=' - item_sep: ',' - exclude_keys: [] - runtime: - version: 1.3.2 - version_base: '1.3' - cwd: /Users/jannisbecktepe/Developer/arlbench_main/examples - config_sources: - - path: hydra.conf - schema: pkg - provider: hydra - - path: /Users/jannisbecktepe/Developer/arlbench_main/examples/configs - schema: file - provider: main - - path: hydra_plugins.hydra_colorlog.conf - schema: pkg - provider: hydra-colorlog - - path: '' - schema: structured - provider: schema - output_dir: ??? - choices: - incumbent: dqn_acrobot_my_optimizer - environment: cc_acrobot - algorithm: dqn - hydra/env: default - hydra/callbacks: null - hydra/job_logging: default - hydra/hydra_logging: default - hydra/hydra_help: default - hydra/help: default - hydra/sweeper: basic - hydra/launcher: basic - hydra/output: default - verbose: false -jax_enable_x64: true -load_checkpoint: '' -hpo_method: my_optimizer -autorl: - seed: 100 - env_framework: ${environment.framework} - env_name: ${environment.name} - env_kwargs: ${environment.kwargs} - eval_env_kwargs: ${environment.eval_kwargs} - n_envs: ${environment.n_envs} - algorithm: ${algorithm} - cnn_policy: ${environment.cnn_policy} - nas_config: ${nas_config} - n_total_timesteps: ${environment.n_total_timesteps} - checkpoint: [] - checkpoint_name: default_checkpoint - checkpoint_dir: /tmp - state_features: [] - objectives: - - reward_mean - optimize_objectives: upper - n_steps: 1 - n_eval_steps: 100 - n_eval_episodes: 10 -algorithm: dqn -hp_config: - buffer_prio_sampling: false - buffer_alpha: 0.9 - buffer_beta: 0.9 - buffer_epsilon: 0.001 - buffer_batch_size: 64 - buffer_size: 100000 - initial_epsilon: 0.64 - target_epsilon: 0.112 - gamma: 0.99 - gradient_steps: 1 - learning_rate: 0.0023 - learning_starts: 1032 - normalize_observations: false - train_freq: 4 - use_target_network: true - target_update_interval: 10 - tau: 1.0 -nas_config: - activation: tanh - hidden_size: 64 -environment: - name: Acrobot-v1 - framework: gymnax - n_total_timesteps: 100000.0 - kwargs: {} - eval_kwargs: {} - cnn_policy: false - deterministic_eval: true - n_envs: 8