This repository contains the official implementation of Q-ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size.
To set up python environment (with tool on your taste, in our research we use conda and python 3.8), just install all requirements:
python install -r requirements.txt
However, with such setup, you would also need to install mujoco210 binaries by hand. Sometimes this is not straightforward, but we used this recipe:
mkdir -p /root/.mujoco \
&& wget https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz -O mujoco.tar.gz \
&& tar -xf mujoco.tar.gz -C /root/.mujoco \
&& rm mujoco.tar.gz
export LD_LIBRARY_PATH=/root/.mujoco/mujoco210/bin:${LD_LIBRARY_PATH}
You may also need to install additional dependencies for mujoco_py. We recommend following the official guide from mujoco_py.
We also provide a simpler way, with a dockerfile that is already set up to work, all you have to do is build and run it :)
docker build -t lb_sac .
To run, mount current directory:
docker run -it \
--gpus=all \
--rm \
--volume "<PATH_TO_THE_REPO>/lb-sac:/workspace/lb-sac" \
--name lb_sac \
lb_sac bash
Configs are stored in configs/<algo_name>/<task_type>
. All available hyperparatemers are listed in train.py
.
For example, to start EDAC training on halfcheetah-medium-v2
:
python train.py \
--config_path="configs/edac/halfcheetah/halfcheetah_medium.yaml" \
--device=cuda
You can also use our simple tool for sweeps and multiseed training (same can be done with wandb sweep):
python sweep.py \
--command "python train.py" \
--configs "configs/edac/*" \
--num_seeds 4 \
--num_gpus 8
This will run each config for 4 seeds on all available gpus in the queue (8 at max).
If you use this code for your research, please consider citing the paper:
@article{nikulin2022q,
title={Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size},
author={Nikulin, Alexander and Kurenkov, Vladislav and Tarasov, Denis and Akimov, Dmitry and Kolesnikov, Sergey},
journal={arXiv preprint arXiv:2211.11092},
year={2022}
}