Implementation for the MSc thesis "Generalizing Pareto optimal policies in multi-objective reinforcement learning"
Presentation slides | Thesis
In this thesis, the use of hypernetworks in multi-objective reinforcement learning is explored by augmenting the critic network with a hypernetwork. Two different input configurations for the target network were explored to find out the expressiveness of the predicted parameters.
Objectives: Energy efficiency and forward speed.
Objectives: Jump height and forward speed.
Objectives: Energy efficiency and forward speed.
Start by cloning the repository:
git clone [email protected]:SanteriHei/hypernet_morl.git && cd hypernet_morl
This project uses Pipenv for dependency management, which can be installed by running
pip install --user pipenv
Then, create a new virtual environment and install the required dependencies via
pipenv install --dev
Note
the --dev
flag includes some dependencies that are not always neccessary, so it can be omitted.
Lastly, activate the pipenv shell with pipenv shell
The project also uses Hydra for configuration management and Wandb logging the run information. Thus, it is recommended that one creates an (free) account to wandb before running any experiments. After this, remember to update the following configuration options either via cli (recommended) or by updating configs/session.yaml:
- session_cfg.entity_name
- session_cfg.project_name
- session_cfg.experiment_group
- session_cfg.run_name
Caution
The wandb logging can be turned off via cli option training_cfg.log_to_wandb=False
.
However, this is highly discouraged, since in this case the progress is only
printed to the console, and NOT stored anywhere.
Here are a few examples for running certain experiments presented in the thesis
Halfcheetah with no warmup and ResNet Hypernetwork
python main.py device="cuda:0" seed=0 training_cfg=halfcheetah\
training_cfg.n_warmup_steps=0\
training_cfg.save_individual_losses=False\
training_cfg.save_path="path/to/my-run"\
session_cfg.entity_name="my-entity"\
session_cfg.project_name="my-project"\
session_cfg.run_name="my-run"\
session_cfg.experiment_group="my-group"
Halfcheetah with no warmup and MLP hypernetwork
python main.py device="cuda:0" seed=0 training_cfg=halfcheetah\
critic_cfg=mlp_hypercritic\
training_cfg.n_warmup_steps=0\
training_cfg.save_individual_losses=False\
training_cfg.save_path="path/to/my-run"\
session_cfg.entity_name="my-entity"\
session_cfg.project_name="my-project"\
session_cfg.run_name="my-run"\
session_cfg.experiment_group="my-group"
Halfcheetah with skewed warmup distribution and ResNet hypernetwork
python main.py device="cuda:0" seed=0 training_cfg=halfcheetah\
training_cfg.n_warmup_steps=2.4e5\
training_cfg.warmup_use_uneven_sampling=True\
training_cfg.save_individual_losses=False\
training_cfg.save_path="path/to/my-run"\
session_cfg.entity_name="my-entity"\
session_cfg.project_name="my-project"\
session_cfg.run_name="my-run"\
session_cfg.experiment_group="my-group"
Hopper with no warmup and ResNet Hypernetwork
python main.py device="cuda:0" seed=0 training_cfg=hopper\
training_cfg.n_warmup_steps=0\
training_cfg.save_individual_losses=False\
training_cfg.save_path="path/to/my-run"\
session_cfg.entity_name="my-entity"\
session_cfg.project_name="my-project"\
session_cfg.run_name="my-run"\
session_cfg.experiment_group="my-group"
Swimmer with no warmup and ResNet Hypernetwork
python main.py device="cuda:0" seed=0 training_cfg=swimmer\
training_cfg.n_warmup_steps=0\
training_cfg.save_individual_losses=False\
training_cfg.save_path="path/to/my-run"\
session_cfg.entity_name="my-entity"\
session_cfg.project_name="my-project"\
session_cfg.run_name="my-run"\
session_cfg.experiment_group="my-group"
This work heavily utilizes the previous research of Sarafian et al. and Lu et al. for application of the hypernetworks in reinforcement learning context and for the CAPQL algorithm used for training the MORL agents respectively.
The proposed methods were evaluated in three robot controls tasks designed by Xu et al. 1. Moreover, the original implementation by Xu et al. was used for PGMORL, while the implementations from morl-baselines by Felten et al. were used for CAPQL and GPI-LS.
Footnotes
-
The tasks were ported to the v4 implementations of the environments. ↩