Q-ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size

This repository contains the official implementation of Q-ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size.

Dependencies & Docker setup

To set up python environment (with tool on your taste, in our research we use conda and python 3.8), just install all requirements:

python install -r requirements.txt

However, with such setup, you would also need to install mujoco210 binaries by hand. Sometimes this is not straightforward, but we used this recipe:

mkdir -p /root/.mujoco \
    && wget https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz -O mujoco.tar.gz \
    && tar -xf mujoco.tar.gz -C /root/.mujoco \
    && rm mujoco.tar.gz
export LD_LIBRARY_PATH=/root/.mujoco/mujoco210/bin:${LD_LIBRARY_PATH}

You may also need to install additional dependencies for mujoco_py. We recommend following the official guide from mujoco_py.

Docker

We also provide a simpler way, with a dockerfile that is already set up to work, all you have to do is build and run it :)

docker build -t lb_sac .

To run, mount current directory:

docker run -it \
    --gpus=all \
    --rm \
    --volume "<PATH_TO_THE_REPO>/lb-sac:/workspace/lb-sac" \
    --name lb_sac \
    lb_sac bash

How to reproduce experiments

Configs are stored in configs/<algo_name>/<task_type>. All available hyperparatemers are listed in train.py. For example, to start EDAC training on halfcheetah-medium-v2:

python train.py \
    --config_path="configs/edac/halfcheetah/halfcheetah_medium.yaml" \
    --device=cuda

You can also use our simple tool for sweeps and multiseed training (same can be done with wandb sweep):

python sweep.py \
    --command "python train.py" \
    --configs "configs/edac/*" \
    --num_seeds 4 \
    --num_gpus 8

This will run each config for 4 seeds on all available gpus in the queue (8 at max).

Citing

If you use this code for your research, please consider citing the paper:

@article{nikulin2022q,
  title={Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size},
  author={Nikulin, Alexander and Kurenkov, Vladislav and Tarasov, Denis and Akimov, Dmitry and Kolesnikov, Sergey},
  journal={arXiv preprint arXiv:2211.11092},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
images		images
lb_sac		lb_sac
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
sweep.py		sweep.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Q-ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size

Dependencies & Docker setup

Docker

How to reproduce experiments

Citing

About

Releases

Packages

Contributors 2

Languages

License

tinkoff-ai/lb-sac

Folders and files

Latest commit

History

Repository files navigation

Q-ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size

Dependencies & Docker setup

Docker

How to reproduce experiments

Citing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages