[Project page] [Paper] [ArXiv]
Yunhao Luo1,2, Yilun Du3
1Georgia Tech, 2Brown, 3Harvard
This is the official implementation for "Grounding Video Models to Actions through Goal Conditioned Exploration".
This codebase contains code for the agent exploration and policy inference in the environments. Please see v2a-video-model-release for code to train the video models.
Grounding Video Model to Action. Our approach learns to ground a large pretrained video model into continuous actions through goal-directed exploration in the environment. Given a synthesized video, a goal-conditioned policy attempts to reach each visual goal in the video, with data in the resulting real-world execution saved in a replay buffer to train the goal-conditioned policy.
The following procedure should work well for a GPU machine with cuda 11.8. Our machine is installed with Red Hat Enterprise Linux v9.2 and loads the following modules:
1) zlib; 2) cuda/11.8.0; 3) git-lfs; 4) expat; 5) mesa/22.1.6; 6) libiconv; 7) ffmpeg; 8) glew;
For some other Linux versions, the equivalent modules might be required.
Please follow the steps below to create a conda environment to reproduce our simulation benchmark results on Libero environment.
- Create a python env.
conda create -n v2a_libero_release python=3.9.19
Please keep the conda env name as above, since it will be an identifier to register env.
- Install other packages in
requirements.txt
(this might take several minutes).
pip install -r requirements.txt
- Download the pre-trained video model.
Please visit this OneDrive folder for video model checkpoints.
For the Libero environment, download libero-video-model.zip
, put it in the home directory of this repo, and unzip it with the following command
unzip libero-video-model.zip -d .
This command should unzip the checkpoint to a directory ckpts
.
The training of our video model follows AVDC.
- Download random action data.
For efficiency and reproducibility, we pre-sampled a dataset of random actions for policy training and is hosted in this OneDrive folder.
For Libero environment, download
lb_randsam_8tk_perTk500.hdf5
and move it to default location relative to the root of this repo:
export lb_dir= data_dir/scratch/libero/env_rand_samples
mkdir -p $lb_dir && mv lb_randsam_8tk_perTk500.hdf5 $lb_dir
This data file is bulky, where rclone
can be used if downloading to a remote machine. We also provide the code to collect your own random action in environment/libero/lb_data
.
With the video model checkpoint and random action dataset prepared, you can now start training the model with video-guided exploration.
To launch the training for Libero, run
sh scripts/train_libero_dp.sh
You can change the $config
variable inside the script above to launch different experiments. You can refer to the default config in the script as template.
We provide links to pre-trained goal-conditioned policy models below, along with their corresponding config files in this repo. All models are hosted under this OneDrive folder.
Model | Link | Config |
---|---|---|
Libero 8 Tasks | Link | config/libero/lb_tk8_65to72.py |
You need to put the downloaded zip files in the root directory of this repo and unzip them, so that the created files can be in proper relative locations.
The files will be automatically put under the logs
folder, which is organized roughly according to the following structure (some additional prefix and postfix might be added):
└── logs
├── ${environment_1}
│ ├── diffusion
│ │ └── ${experiment_name}
│ │ ├── model-${iter}.pt
│ │ └── {dataset, trainer}_config.pkl
│ └── plans
│ └── ${experiment_name}
│ ├── 0
│ ├── {experiment_time:%y%m%d-%H%M%S}
│ ├── ...
│
├── ${environment_2}
│ └── ...
The model-${iter}.pt
files contain the network weights and the {}_config.pkl
files contain the instantation arguments for the relevant classes. A dummy random action dataset will also be created as a placeholder for loading.
After downloading and unzipping the model weights, you can launch policy rollout using the script provided below. The results will be saved in .mp4
and .json
files.
To evaluate different models, you can change the $config
varibale inside the script, which should be the relative path to a config file.
For Libero environment, run:
./diffuser/libero/plan_lb_list.sh $1 $2
Please replace $1
by how many episodes to evaluate on (e.g., 25) and $2
by a GPU index.
We create a new config file under the config
folder for a new experiment. Each experiment will create a new folder under logs
. You can refer to existing config files as examples.
This repository is released under the MIT license. See LICENSE for additional details.
- The implementation of Diffusion Policy is adapted from diffusion_policy.
- The implementation of our video generative models is based on AVDC.
Contact Yunhao Luo if you have any questions or suggestions.
If you find our work useful, please consider citing:
@misc{luo2024groundingvideomodelsactions,
title={Grounding Video Models to Actions through Goal Conditioned Exploration},
author={Yunhao Luo and Yilun Du},
year={2024},
eprint={2411.07223},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2411.07223},
}