DAMIIRL

This repository is the implimention of the paper:

Deep Adaptive Multi-Intention Inverse Reinforcement Learning
Ariyan Bighashdel, Panagiotis Meletis, Pavol Jancura, Gijs Dubbelman
Accepted for presentation at ECML PKDD 2021

In this paper, two algorithms, namely "SEM-MIIRL" and "MCEM-MIIRL" are developed which can learn an a priori unknown number of nonlinear reward functions from unlabeled experts' demonstrations. The algorithms are evaluated on two proposed environments, namely "M-ObjectWorld" and "M-BinaryWorld". The proposed algorithms, and the environments are implemented in this repository.

If you find this code useful in your research then please cite

@inproceedings{gupta2018social,
  title={Deep Adaptive Multi-Intention Inverse Reinforcement Learning},
  author={Bighashdel, Ariyan and Meletis, Panagiotis and Jancura, Pavol and Dubbelman, Gijs},
  booktitle={Springer in the Lecture Notes in Computer Science Series (LNCS)},
  number={CONF},
  year={2020}
}

Dependencies

The code is developed and tested on Ubuntu 18.04 with Python 3.6 and PyTorch 1.9.

You can install the dependencies by running:

pip install -r requirements.txt   # Install dependencies

Implimentation of "Deep Adaptive Multi-intention Inverse Reinforcement Learning"

Training

A simple experiment with a default set of parameters can be done by running:

python3 main.py

The following paramters are defined which can be set for various experiments in main.py:

miirl_type: the main algorithm which can be either 'SEM' or 'MCEM', where 'SEM' : SEM-MIIRL and 'MCEM' : MCEM-MIIRL
game_type: the environment which can be either 'ow' or 'bw', where 'ow' : M-ObjectWorld and 'bw' : M-BinaryWorld
sample_length: the length of each demonstration sample
alpha: the concentration parameter
sample_size: the number of demonstrations for each reward/intention
rewards_types: the intention/reward types which are in total six, ['A','B','C','D','E','F']
mirl_maxiter: the maximum number of iterations

Experiment

We conduct an experiment by setting the paramters as:

miirl_type = 'SEM'
game_type = 'ow'
sample_length = 8
alpha = 1
sample_size = 16
rewards_types = ['A','B']
mirl_maxiter = 200

The following picture shows the true and predicted rewards:

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Documents		Documents
checkpoints		checkpoints
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
binaryworld.py		binaryworld.py
drawing.py		drawing.py
main.py		main.py
mcem.py		mcem.py
mdp.py		mdp.py
objectworld.py		objectworld.py
requirements.		requirements.
rewardmodel.py		rewardmodel.py
sem.py		sem.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DAMIIRL

Dependencies

Training

Experiment

About

Releases

Packages

Contributors 2

Languages

License

tue-mps/damiirl

Folders and files

Latest commit

History

Repository files navigation

DAMIIRL

Dependencies

Training

Experiment

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages