Grounding Video Models to Actions through Goal Conditioned Exploration

¹Georgia Tech, ²Brown, ³Harvard

This is the official implementation for "Grounding Video Models to Actions through Goal Conditioned Exploration".

This codebase contains code for the agent exploration and policy inference in the environments. Please see v2a-video-model-release for code to train the video models.

Grounding Video Model to Action. Our approach learns to ground a large pretrained video model into continuous actions through goal-directed exploration in the environment. Given a synthesized video, a goal-conditioned policy attempts to reach each visual goal in the video, with data in the resulting real-world execution saved in a replay buffer to train the goal-conditioned policy.

🛠️ Installation

The following procedure should work well for a GPU machine with cuda 11.8. Our machine is installed with Red Hat Enterprise Linux v9.2 and loads the following modules:

1) zlib; 2) cuda/11.8.0; 3) git-lfs; 4) expat; 5) mesa/22.1.6; 6) libiconv; 7) ffmpeg; 8) glew;

For some other Linux versions, the equivalent modules might be required.

Please follow the steps below to create a conda environment to reproduce our simulation benchmark results on Libero environment.

Create a python env.

conda create -n v2a_libero_release python=3.9.19

Please keep the conda env name as above, since it will be an identifier to register env.

Install other packages in requirements.txt (this might take several minutes).

pip install -r requirements.txt

🗃️ Prepare Data

Download the pre-trained video model.

Please visit this OneDrive folder for video model checkpoints. For the Libero environment, download libero-video-model.zip, put it in the home directory of this repo, and unzip it with the following command

unzip libero-video-model.zip -d .

This command should unzip the checkpoint to a directory ckpts. The training of our video model follows AVDC.

Download random action data. For efficiency and reproducibility, we pre-sampled a dataset of random actions for policy training and is hosted in this OneDrive folder. For Libero environment, download lb_randsam_8tk_perTk500.hdf5 and move it to default location relative to the root of this repo:

export lb_dir= data_dir/scratch/libero/env_rand_samples
mkdir -p $lb_dir && mv lb_randsam_8tk_perTk500.hdf5 $lb_dir

This data file is bulky, where rclone can be used if downloading to a remote machine. We also provide the code to collect your own random action in environment/libero/lb_data.

🕹️ Train a Model

With the video model checkpoint and random action dataset prepared, you can now start training the model with video-guided exploration.

To launch the training for Libero, run

sh scripts/train_libero_dp.sh

You can change the $config variable inside the script above to launch different experiments. You can refer to the default config in the script as template.

📊 Using Pretrained Models

Downloading weights

We provide links to pre-trained goal-conditioned policy models below, along with their corresponding config files in this repo. All models are hosted under this OneDrive folder.

Model	Link	Config
Libero 8 Tasks	Link	`config/libero/lb_tk8_65to72.py`

You need to put the downloaded zip files in the root directory of this repo and unzip them, so that the created files can be in proper relative locations. The files will be automatically put under the logs folder, which is organized roughly according to the following structure (some additional prefix and postfix might be added):

└── logs
    ├── ${environment_1}
    │   ├── diffusion
    │   │   └── ${experiment_name}
    │   │       ├── model-${iter}.pt
    │   │       └── {dataset, trainer}_config.pkl
    │   └── plans
    │       └── ${experiment_name}
    │           ├── 0
    │               ├── {experiment_time:%y%m%d-%H%M%S}
    │               ├── ...
    │
    ├── ${environment_2}
    │   └── ...

The model-${iter}.pt files contain the network weights and the {}_config.pkl files contain the instantation arguments for the relevant classes. A dummy random action dataset will also be created as a placeholder for loading.

Policy Rollout

After downloading and unzipping the model weights, you can launch policy rollout using the script provided below. The results will be saved in .mp4 and .json files. To evaluate different models, you can change the $config varibale inside the script, which should be the relative path to a config file.

For Libero environment, run:

./diffuser/libero/plan_lb_list.sh $1 $2

Please replace $1 by how many episodes to evaluate on (e.g., 25) and $2 by a GPU index.

➕ Add an Experiment

We create a new config file under the config folder for a new experiment. Each experiment will create a new folder under logs. You can refer to existing config files as examples.

🏷️ License

This repository is released under the MIT license. See LICENSE for additional details.

🙏 Acknowledgement

The implementation of Diffusion Policy is adapted from diffusion_policy.
The implementation of our video generative models is based on AVDC.

Contact Yunhao Luo if you have any questions or suggestions.

📝 Citations

If you find our work useful, please consider citing:

@misc{luo2024groundingvideomodelsactions,
      title={Grounding Video Models to Actions through Goal Conditioned Exploration}, 
      author={Yunhao Luo and Yilun Du},
      year={2024},
      eprint={2411.07223},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2411.07223}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
config		config
diffuser		diffuser
environment		environment
flowdiffusion		flowdiffusion
media		media
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Grounding Video Models to Actions through Goal Conditioned Exploration

🛠️ Installation

🗃️ Prepare Data

🕹️ Train a Model

📊 Using Pretrained Models

Downloading weights

Policy Rollout

➕ Add an Experiment

🏷️ License

🙏 Acknowledgement

📝 Citations

About

Releases

Packages

Languages

License

video-to-action/video-to-action-release

Folders and files

Latest commit

History

Repository files navigation

Grounding Video Models to Actions through Goal Conditioned Exploration

🛠️ Installation

🗃️ Prepare Data

🕹️ Train a Model

📊 Using Pretrained Models

Downloading weights

Policy Rollout

➕ Add an Experiment

🏷️ License

🙏 Acknowledgement

📝 Citations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages