This repository provides the re-implementation of Coarse-to-fine Q-Network (CQN) and Coarse-to-fine Q-Network with Action Sequence (CQN-AS), introduced in:
Continuous Control with Coarse-to-fine Reinforcement Learning
Younggyo Seo, Jafar Uruç, Stephen James
CoRL 2024
Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning
Younggyo Seo, Pieter Abbeel
Preprint
In summary, CQN is a sample-efficient value-based RL algorithm that uses discrete actions for solving continuous control problems. The key idea is to apply multiple levels of discretization to continuous action space, and train RL agents to zoom-into continuous action space in a coarse-to-fine manner.
CQN-AS extends CQN by training a critic network that outputs Q-values over a series of actions, allowing for learning useful value functions from noisy training data such as human-collected demonstrations.
We also provide implementation of DrQ-v2+, a variant of DrQ-v2 that is highly optimized for demo-driven RL setup.
- We provide logs used for drawing learning curves in
logs
directory. Note that this can be different from the curves in the paper (at the point of Jan 27th, 2025). We will update the paper soon.
import pickle
# BENCHMARK = {humanoidbench, rlbench, bigym}
with open("BENCHMARK_results.pkl", "rb") as f:
logs = pickle.load(f)
- We adopt a domain-based code maintenance design to avoid having a too complex code design. In other words, instead of having a shared code that can be used for all domains, we maintain a separate set of files for each domain.
- This codebase might not fully reproduce the experiments in the paper due to potential human errors in porting code. Please let us know if you encounter any discrepancy in the performance between the reported results and the results from running this code.
- Refactor to use
ReplayBuffer
,LazyTensorStorage
oftorchrl
- Refactor to replace numpy-based TemporalEnsemble implementation with torch-based implementation
- Check if the code fully reproduces the results in the paper.
Our codebase supports running CQN and CQN-AS on various domains popular for researchers. We plan to support more domains and any PR or suggestion to support more domains (including yours) is welcome!
Domain | State-based | Pixel-based | Action Sequence | Demo-driven | Stable |
---|---|---|---|---|---|
BiGym | ❌ | ✅ | ✅ | ✅ | ✅ |
RLBench | ❌ | ✅ | ✅ | ✅ | ✅ |
HumanoidBench | ✅ | ❌ | ✅ | ❌ | ✅ |
DMC | ✅ | ✅ | ❌ | ❌ |
Install conda environment:
conda env create -f conda_env.yml
conda activate cqn
For faster experiments, install the nightly version of PyTorch, and set use_compile
to True.
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124
Install BiGym (latest version at the date of Oct 29th should be used).
git clone https://github.com/chernyadev/bigym.git
cd bigym
git checkout 72d305437d5a13800ea633479a1060619fc14e54
pip install -e .
BiGym will download demonstrations and save pre-processed cache in the first run of each task, so there is no need for each user to handle demonstrations by themselves. Our code also automatically handles demonstrations that can't be replayed by saving them as training data but not demonstration data.
Run experiments (CQN and CQN-AS):
# Run CQN
CUDA_VISIBLE_DEVICES=0 MUJOCO_EGL_DEVICE_ID=0 python train_cqn_bigym.py bigym_task=move_plate seed=1
# Run CQN-AS (Ours)
CUDA_VISIBLE_DEVICES=0 MUJOCO_EGL_DEVICE_ID=0 python train_cqn_as_bigym.py bigym_task=move_plate seed=1
# Run DrQ-v2+
CUDA_VISIBLE_DEVICES=0 MUJOCO_EGL_DEVICE_ID=0 python train_drqv2plus_bigym.py bigym_task=move_plate seed=1
Install RLBench and PyRep (latest versions at the date of Oct 29th, 2024 should be used). Follow the guide in original repositories for (1) installing RLBench and PyRep and (2) enabling headless mode. (See README in RLBench & Robobase for information on installing RLBench.)
git clone https://github.com/stepjam/RLBench
git clone https://github.com/stepjam/PyRep
# Install PyRep
cd PyRep
git checkout 8f420be8064b1970aae18a9cfbc978dfb15747ef
pip install .
# Install RLBench
cd RLBench
git checkout b80e51feb3694d9959cb8c0408cd385001b01382
pip install .
Pre-collect demonstrations
cd RLBench/rlbench
CUDA_VISIBLE_DEVICES=0 DISPLAY=:0.0 python dataset_generator.py --save_path=/your/own/directory --image_size 84 84 --renderer opengl3 --episodes_per_task 100 --variations 1 --processes 1 --tasks take_lid_off_saucepan --arm_max_velocity 2.0 --arm_max_acceleration 8.0
Run experiments :
# Run CQN
CUDA_VISIBLE_DEVICES=0 DISPLAY=:0.0 python train_cqn_rlbench.py rlbench_task=take_lid_off_saucepan num_demos=100 dataset_root=/your/own/directory
# Run CQN-AS (Ours)
CUDA_VISIBLE_DEVICES=0 DISPLAY=:0.0 python train_cqn_as_rlbench.py rlbench_task=take_lid_off_saucepan num_demos=100 dataset_root=/your/own/directory
# Run DrQ-v2+
CUDA_VISIBLE_DEVICES=0 DISPLAY=:0.0 python train_drqv2plus_rlbench.py rlbench_task=take_lid_off_saucepan num_demos=100 dataset_root=/your/own/directory
Install HumanoidBench
git clone https://github.com/carlosferrazza/humanoid-bench.git
cd humanoid-bench
pip install -e .
Note: HumanoidBench requires opencv-python==4.10.0.84
to be installed but this can cause RLBench experiments to fail. In that case, uninstall opencv-python==4.10.0.84
and install opencv-python-headless==4.10.0.84
. You can run both experiments with a single virtual environment with headless opencv.
Run experiments:
# Run CQN
CUDA_VISIBLE_DEVICES=0 MUJOCO_EGL_DEVICE_ID=0 python train_cqn_humanoid.py task_name=h1hand-stand-v0
# Run CQN-AS
CUDA_VISIBLE_DEVICES=0 MUJOCO_EGL_DEVICE_ID=0 python train_cqn_as_humanoid.py task_name=h1hand-stand-v0
Note that h1hand
is different from h1
and h1hand
is the default task used in the original benchmark.
Run experiments:
# For pixel-based experiments
CUDA_VISIBLE_DEVICES=0 python train_cqn_dmc.py dmc_task=cartpole_swingup
# For state-based experiments
CUDA_VISIBLE_DEVICES=0 python train_cqn_dmc_state.py dmc_task=cartpole_swingup
Warning: CQN is not extensively tested in DMC
This repository is based on public implementation of DrQ-v2
# CQN
@inproceedings{seo2024continuous,
title={Continuous Control with Coarse-to-fine Reinforcement Learning},
author={Seo, Younggyo and Uru{\c{c}}, Jafar and James, Stephen},
booktitle={Conference on Robot Learning},
year={2024}
}
# CQN-AS
@article{seo2024reinforcement,
title={Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning},
author={Seo, Younggyo and Abbeel, Pieter},
journal={arXiv preprint arXiv:2411.12155},
year={2024}
}