Skip to content

Latest commit

 

History

History
184 lines (151 loc) · 6.75 KB

README.md

File metadata and controls

184 lines (151 loc) · 6.75 KB

Hypernetwork augmented multi-objective Actor-Critic

Implementation for the MSc thesis "Generalizing Pareto optimal policies in multi-objective reinforcement learning"
Presentation slides | Thesis

About

In this thesis, the use of hypernetworks in multi-objective reinforcement learning is explored by augmenting the critic network with a hypernetwork. Two different input configurations for the target network were explored to find out the expressiveness of the predicted parameters.

Examples of the learned policies

Halfcheetah

Objectives: Energy efficiency and forward speed.

Hopper

Objectives: Jump height and forward speed.

Swimmer

Objectives: Energy efficiency and forward speed.

Getting started

Start by cloning the repository:

git clone [email protected]:SanteriHei/hypernet_morl.git && cd hypernet_morl

This project uses Pipenv for dependency management, which can be installed by running

pip install --user pipenv

Then, create a new virtual environment and install the required dependencies via

pipenv install --dev

Note

the --dev flag includes some dependencies that are not always neccessary, so it can be omitted.

Lastly, activate the pipenv shell with pipenv shell

The project also uses Hydra for configuration management and Wandb logging the run information. Thus, it is recommended that one creates an (free) account to wandb before running any experiments. After this, remember to update the following configuration options either via cli (recommended) or by updating configs/session.yaml:

  • session_cfg.entity_name
  • session_cfg.project_name
  • session_cfg.experiment_group
  • session_cfg.run_name

Caution

The wandb logging can be turned off via cli option training_cfg.log_to_wandb=False. However, this is highly discouraged, since in this case the progress is only printed to the console, and NOT stored anywhere.

Example configurations

Here are a few examples for running certain experiments presented in the thesis

Halfcheetah with no warmup and ResNet Hypernetwork
python main.py device="cuda:0" seed=0 training_cfg=halfcheetah\
  training_cfg.n_warmup_steps=0\
  training_cfg.save_individual_losses=False\
  training_cfg.save_path="path/to/my-run"\
  session_cfg.entity_name="my-entity"\
  session_cfg.project_name="my-project"\
  session_cfg.run_name="my-run"\
  session_cfg.experiment_group="my-group"
Halfcheetah with no warmup and MLP hypernetwork
python main.py device="cuda:0" seed=0 training_cfg=halfcheetah\
  critic_cfg=mlp_hypercritic\
  training_cfg.n_warmup_steps=0\
  training_cfg.save_individual_losses=False\
  training_cfg.save_path="path/to/my-run"\
  session_cfg.entity_name="my-entity"\
  session_cfg.project_name="my-project"\
  session_cfg.run_name="my-run"\
  session_cfg.experiment_group="my-group"
Halfcheetah with skewed warmup distribution and ResNet hypernetwork
python main.py device="cuda:0" seed=0 training_cfg=halfcheetah\
  training_cfg.n_warmup_steps=2.4e5\
  training_cfg.warmup_use_uneven_sampling=True\
  training_cfg.save_individual_losses=False\
  training_cfg.save_path="path/to/my-run"\
  session_cfg.entity_name="my-entity"\
  session_cfg.project_name="my-project"\
  session_cfg.run_name="my-run"\
  session_cfg.experiment_group="my-group"
Hopper with no warmup and ResNet Hypernetwork
python main.py device="cuda:0" seed=0 training_cfg=hopper\
  training_cfg.n_warmup_steps=0\
  training_cfg.save_individual_losses=False\
  training_cfg.save_path="path/to/my-run"\
  session_cfg.entity_name="my-entity"\
  session_cfg.project_name="my-project"\
  session_cfg.run_name="my-run"\
  session_cfg.experiment_group="my-group"
Swimmer with no warmup and ResNet Hypernetwork
python main.py device="cuda:0" seed=0 training_cfg=swimmer\
  training_cfg.n_warmup_steps=0\
  training_cfg.save_individual_losses=False\
  training_cfg.save_path="path/to/my-run"\
  session_cfg.entity_name="my-entity"\
  session_cfg.project_name="my-project"\
  session_cfg.run_name="my-run"\
  session_cfg.experiment_group="my-group"

Acknowledgements

This work heavily utilizes the previous research of Sarafian et al. and Lu et al. for application of the hypernetworks in reinforcement learning context and for the CAPQL algorithm used for training the MORL agents respectively.

The proposed methods were evaluated in three robot controls tasks designed by Xu et al. 1. Moreover, the original implementation by Xu et al. was used for PGMORL, while the implementations from morl-baselines by Felten et al. were used for CAPQL and GPI-LS.

Footnotes

  1. The tasks were ported to the v4 implementations of the environments.