GitHub - SanteriHei/hypernet_morl: Implementation of hypernetwork augmented Actor-Critic in multi-objective reinforcement learning for MsC Thesis: "Generalizing Pareto optimal policies in multi-objective reinforcement learning"

Hypernetwork augmented multi-objective Actor-Critic

Implementation for the MSc thesis "Generalizing Pareto optimal policies in multi-objective reinforcement learning"
Presentation slides | Thesis

About

In this thesis, the use of hypernetworks in multi-objective reinforcement learning is explored by augmenting the critic network with a hypernetwork. Two different input configurations for the target network were explored to find out the expressiveness of the predicted parameters.

Examples of the learned policies

Halfcheetah

Objectives: Energy efficiency and forward speed.

Hopper

Objectives: Jump height and forward speed.

Swimmer

Objectives: Energy efficiency and forward speed.

Getting started

Start by cloning the repository:

git clone [email protected]:SanteriHei/hypernet_morl.git && cd hypernet_morl

This project uses Pipenv for dependency management, which can be installed by running

pip install --user pipenv

Then, create a new virtual environment and install the required dependencies via

pipenv install --dev

Note

the --dev flag includes some dependencies that are not always neccessary, so it can be omitted.

Lastly, activate the pipenv shell with pipenv shell

The project also uses Hydra for configuration management and Wandb logging the run information. Thus, it is recommended that one creates an (free) account to wandb before running any experiments. After this, remember to update the following configuration options either via cli (recommended) or by updating configs/session.yaml:

session_cfg.entity_name
session_cfg.project_name
session_cfg.experiment_group
session_cfg.run_name

Caution

The wandb logging can be turned off via cli option training_cfg.log_to_wandb=False. However, this is highly discouraged, since in this case the progress is only printed to the console, and NOT stored anywhere.

Example configurations

Here are a few examples for running certain experiments presented in the thesis

Halfcheetah with no warmup and ResNet Hypernetwork

python main.py device="cuda:0" seed=0 training_cfg=halfcheetah\
  training_cfg.n_warmup_steps=0\
  training_cfg.save_individual_losses=False\
  training_cfg.save_path="path/to/my-run"\
  session_cfg.entity_name="my-entity"\
  session_cfg.project_name="my-project"\
  session_cfg.run_name="my-run"\
  session_cfg.experiment_group="my-group"

Halfcheetah with no warmup and MLP hypernetwork

python main.py device="cuda:0" seed=0 training_cfg=halfcheetah\
  critic_cfg=mlp_hypercritic\
  training_cfg.n_warmup_steps=0\
  training_cfg.save_individual_losses=False\
  training_cfg.save_path="path/to/my-run"\
  session_cfg.entity_name="my-entity"\
  session_cfg.project_name="my-project"\
  session_cfg.run_name="my-run"\
  session_cfg.experiment_group="my-group"

Halfcheetah with skewed warmup distribution and ResNet hypernetwork

python main.py device="cuda:0" seed=0 training_cfg=halfcheetah\
  training_cfg.n_warmup_steps=2.4e5\
  training_cfg.warmup_use_uneven_sampling=True\
  training_cfg.save_individual_losses=False\
  training_cfg.save_path="path/to/my-run"\
  session_cfg.entity_name="my-entity"\
  session_cfg.project_name="my-project"\
  session_cfg.run_name="my-run"\
  session_cfg.experiment_group="my-group"

Hopper with no warmup and ResNet Hypernetwork

python main.py device="cuda:0" seed=0 training_cfg=hopper\
  training_cfg.n_warmup_steps=0\
  training_cfg.save_individual_losses=False\
  training_cfg.save_path="path/to/my-run"\
  session_cfg.entity_name="my-entity"\
  session_cfg.project_name="my-project"\
  session_cfg.run_name="my-run"\
  session_cfg.experiment_group="my-group"

Swimmer with no warmup and ResNet Hypernetwork

python main.py device="cuda:0" seed=0 training_cfg=swimmer\
  training_cfg.n_warmup_steps=0\
  training_cfg.save_individual_losses=False\
  training_cfg.save_path="path/to/my-run"\
  session_cfg.entity_name="my-entity"\
  session_cfg.project_name="my-project"\
  session_cfg.run_name="my-run"\
  session_cfg.experiment_group="my-group"

Acknowledgements

This work heavily utilizes the previous research of Sarafian et al. and Lu et al. for application of the hypernetworks in reinforcement learning context and for the CAPQL algorithm used for training the MORL agents respectively.

The proposed methods were evaluated in three robot controls tasks designed by Xu et al. ¹. Moreover, the original implementation by Xu et al. was used for PGMORL, while the implementations from morl-baselines by Felten et al. were used for CAPQL and GPI-LS.

The tasks were ported to the v4 implementations of the environments. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
configs		configs
docs		docs
reference_sets		reference_sets
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hypernetwork augmented multi-objective Actor-Critic

About

Examples of the learned policies

Halfcheetah

Hopper

Swimmer

Getting started

Example configurations

Acknowledgements

About

Releases

Packages

Languages

License

SanteriHei/hypernet_morl

Folders and files

Latest commit

History

Repository files navigation

Hypernetwork augmented multi-objective Actor-Critic

About

Examples of the learned policies

Halfcheetah

Hopper

Swimmer

Getting started

Example configurations

Acknowledgements

Footnotes

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages