This is the official implementation of our ICRA'24 paper Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning.
The code is adapted from Diffusion Policy.
- 7/1/2024, The 7 min presentation for ICRA'24 is online! [Youtube].
- 1/29/2024, our paper and the attached video have been aceepted to ICRA'24 🎉.
- 1/12/2024, a new version of our paper has been released.
Click the GIF below to watch the full video!
We propose Crossway Diffusion, a simple yet effective method to enhance diffusion-based visuomotor policy learning.
By introducing a carefully designed state decoder and a simple reconstruction objective, we explicitly regularize the intermediate representation of the diffusion model to capture the information of the input states, leading to enhanced performance across all datasets.
Our major contribution is included in:
- diffusion_policy/workspace/train_crossway_diffusion_unet_hybrid_workspace.py (newly added)
- diffusion_policy/policy/crossway_diffusion_unet_hybrid_image_policy.py (newly added)
- diffusion_policy/model/diffusion/conv2d_components.py (newly added)
- diffusion_policy/model/diffusion/conditional_unet1d.py (modified)
The Python environment used in this project is identical to Diffusion Policy. Please refer to this link for detailed installation instructions.
(Optional) To manually control the image rendering device through the environment variable EGL_DEVICE_ID
, replace the original robomimic/envs/env_robosuite.py
in robomimic
with this modified file.
Please follow the guide at this link to download the simulated datasets.
Our real-world datasets are available at Hugging Face Dataset. The dataset files have a similar structure as robomimic. Please check dataset_readme.md to train on our and your own datasets.
To train a model on simulated datasets with a specific random seed:
EGL_DEVICE_ID=0 python train.py --config-dir=config/${task}/ --config-name=type[a-d].yaml training.seed=4[2-4]
where ${EGL_DEVICE_ID}
defines which GPU is used for rendering simulated images, ${task}
can be can_ph
, can_mh
, lift_ph
, lift_mh
, square_ph
, square_mh
, transport_ph
, transport_mh
, tool_hang_ph
and pusht
.
The result will be stored at outputs/
and wandb/
. In our experiments, we use 42, 43 and 44 as the random seeds.
To evaluate a checkpoint:
EGL_DEVICE_ID=0 python eval.py --checkpoint <path to checkpoint.ckpt> --output_dir <path for output> --device cuda:0
By default, the code will evaluate the model for 50 episodes and the results will be available at <path for output>/eval_log.json
.
Our pretrained models and evaluation results are now available at Hugging Face.
This repository is released under the MIT license. See LICENSE for additional details.