Enhancing Offline Learning Models with Human Preferences

Link to the paper: here

Link to the video: here

Overview

Here is the overview of the architecture of our Decision Transformer with the Human Preference model (DTHP).

The main idea of our model is based on the Decision Transformer while integrating the idea of having human preference embeddings to address the biases associated with determining the return-to-go, a notable issue in the context of the Decision Transformer.

We use the Preference Transformer for analyzing the past trajectory and generate the human preference score that will be further put into our Human Preference Integration Layer that combines it with the return-to-go. Here is the structure of our Human Preference Integration Layer.

We show that when trained on D4RL benchmarks with suboptimal rewards-to-go, our model outperforms the vanilla Decision Transformer on both the hopper-medium- expert and the walker2d-medium-expert datasets.

Get Started

Colab Notebook Demo

For a quick self-contained way to run our code, simply upload the Jupyter notebook (train_colab.ipynb) into Google Colab and run every cell. To use a different dataset, uncomment the appropriate code in the block right above the main training block, titled "Download dataset (subset) from HF Datasets". You will be prompted with Weights & Biases to enter your authorization key - simply follow the instructions provided and hit 'enter'.

Download mujoco

Download the MuJoCo version 2.1 binaries for Linux or OSX.
Extract the downloaded mujoco210 directory into ~/.mujoco/mujoco210.
Add the following variables to ~/.bashrc

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/morris88826/.mujoco/mujoco210/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia

source ~/.bashrc

Side Note

If want to render using env.render()

export LD_PRELOAD=/home/morris88826/anaconda3/envs/trajectory/lib/libGLEW.so:/usr/lib/x86_64-linux-gnu/libstdc++.so.6

Need to unset it if using MjRenderContextOffscreen from mujoco_py

unset LD_PRELOAD

Setup the environment

conda env create -f environment.yml
conda activate dthp
pip install "cython<3"

cd ./decision-transformer-hp/trajectory-transformer
pip install e .

cd ..
pip install -r requirements.txt

Download

Dataset

D4RL

cd ./decision-transformer-hp/data
python download_d4rl_datasets.py

D4RL with human preferences: here
Move the dataset into ./decision-transformer-hp/data

Model Weights

DTHP: here

Usage

Train

cd decision-transformer-hp
python experiment.py --env {env_name} --embed_hf --hf_model_path {preference transformer weight path} --from_d4rl

Inference

Set --replay flag to generate trajectories record for visualization

cd decision-transformer-hp
python experiment.py --env {env_name} --embed_hf --hf_model_path {preference transformer weight path} --from_d4rl --inference_only --replay

Visualization

Here we also provide the code for showing the sampled trajectory from inference time.

cd visualize

Replay in the renderer

Make sure that the following is being set

echo $LD_PRELOAD
/home/morris88826/anaconda3/envs/trajectory/lib/libGLEW.so:/usr/lib/x86_64-linux-gnu/libstdc++.so.6

ex:
python replay.py --dataset hopper-medium-expert-v2 --trajectory ./replays/DT+PT/hopper-medium-expert-v2/10/best_traj_36000.pkl

Save the videos

Make sure you unset LD_PRELOAD

unset LD_PRELOAD

ex:
python save_replay.py --dataset hopper-medium-expert-v2 --trajectory ./replays/DT+PT/hopper-medium-expert-v2/10/best_traj_36000.pkl

The result will be saved in the demo folder.

Acknowledgements

Our backbone implementation is from

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.vscode		.vscode
decision-transformer-hp		decision-transformer-hp
papers		papers
visualize		visualize
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
training_colab.ipynb		training_colab.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enhancing Offline Learning Models with Human Preferences

Overview

Get Started

Colab Notebook Demo

Download mujoco

Side Note

Setup the environment

Download

Dataset

Model Weights

Usage

Train

Inference

Visualization

Acknowledgements

About

Releases

Packages

Contributors 3

Languages

kenchanLOL/DecisionTransformerHP

Folders and files

Latest commit

History

Repository files navigation

Enhancing Offline Learning Models with Human Preferences

Overview

Get Started

Colab Notebook Demo

Download mujoco

Side Note

Setup the environment

Download

Dataset

Model Weights

Usage

Train

Inference

Visualization

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages