Skip to content

This is the repository for the CSCE-642 project. We investigated how human preference can affect the performance of state-of-the-art transformer-based offline learning models.

Notifications You must be signed in to change notification settings

kenchanLOL/DecisionTransformerHP

Repository files navigation

Enhancing Offline Learning Models with Human Preferences

Link to the paper: here

Link to the video: here

Overview

Here is the overview of the architecture of our Decision Transformer with the Human Preference model (DTHP). Model Overview

The main idea of our model is based on the Decision Transformer while integrating the idea of having human preference embeddings to address the biases associated with determining the return-to-go, a notable issue in the context of the Decision Transformer.

We use the Preference Transformer for analyzing the past trajectory and generate the human preference score that will be further put into our Human Preference Integration Layer that combines it with the return-to-go. Here is the structure of our Human Preference Integration Layer.

We show that when trained on D4RL benchmarks with suboptimal rewards-to-go, our model outperforms the vanilla Decision Transformer on both the hopper-medium- expert and the walker2d-medium-expert datasets.

Get Started

Colab Notebook Demo

For a quick self-contained way to run our code, simply upload the Jupyter notebook (train_colab.ipynb) into Google Colab and run every cell. To use a different dataset, uncomment the appropriate code in the block right above the main training block, titled "Download dataset (subset) from HF Datasets". You will be prompted with Weights & Biases to enter your authorization key - simply follow the instructions provided and hit 'enter'.

Download mujoco

  1. Download the MuJoCo version 2.1 binaries for Linux or OSX.
  2. Extract the downloaded mujoco210 directory into ~/.mujoco/mujoco210.
  3. Add the following variables to ~/.bashrc
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/morris88826/.mujoco/mujoco210/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia
  1. source ~/.bashrc

Side Note

  • If want to render using env.render()
export LD_PRELOAD=/home/morris88826/anaconda3/envs/trajectory/lib/libGLEW.so:/usr/lib/x86_64-linux-gnu/libstdc++.so.6
  • Need to unset it if using MjRenderContextOffscreen from mujoco_py
unset LD_PRELOAD

Setup the environment

conda env create -f environment.yml
conda activate dthp
pip install "cython<3"

cd ./decision-transformer-hp/trajectory-transformer
pip install e .

cd ..
pip install -r requirements.txt

Download

Dataset

  • D4RL
cd ./decision-transformer-hp/data
python download_d4rl_datasets.py
  • D4RL with human preferences: here
  • Move the dataset into ./decision-transformer-hp/data

Model Weights

Usage

Train

cd decision-transformer-hp
python experiment.py --env {env_name} --embed_hf --hf_model_path {preference transformer weight path} --from_d4rl

Inference

Set --replay flag to generate trajectories record for visualization

cd decision-transformer-hp
python experiment.py --env {env_name} --embed_hf --hf_model_path {preference transformer weight path} --from_d4rl --inference_only --replay 

Visualization

Here we also provide the code for showing the sampled trajectory from inference time.

cd visualize
  1. Replay in the renderer
  • Make sure that the following is being set
echo $LD_PRELOAD
/home/morris88826/anaconda3/envs/trajectory/lib/libGLEW.so:/usr/lib/x86_64-linux-gnu/libstdc++.so.6

ex:
python replay.py --dataset hopper-medium-expert-v2 --trajectory ./replays/DT+PT/hopper-medium-expert-v2/10/best_traj_36000.pkl
  1. Save the videos
  • Make sure you unset LD_PRELOAD
unset LD_PRELOAD

ex:
python save_replay.py --dataset hopper-medium-expert-v2 --trajectory ./replays/DT+PT/hopper-medium-expert-v2/10/best_traj_36000.pkl

The result will be saved in the demo folder.

Acknowledgements

Our backbone implementation is from

About

This is the repository for the CSCE-642 project. We investigated how human preference can affect the performance of state-of-the-art transformer-based offline learning models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published