Skip to content

Latest commit

 

History

History
60 lines (39 loc) · 1.69 KB

README.md

File metadata and controls

60 lines (39 loc) · 1.69 KB

PBRL

Introduction

This is a Pytorch implementation for our paper on

Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning, ICLR 2022.

Prerequisites

Installation and Usage

Install the package of rlkit with

cd d4rl
pip install -e .

For running PBRL on the MuJoCo environments, run:

python examples/pevi_mujoco.py --env walker2d-medium-v2 --gpu 0

For running PBRL-Prior on the MuJoCo environments, run:

python examples/pevi_mujoco.py --env walker2d-medium-v2 --prior --gpu 0

For running PBRL on the Adroit environments, run:

python examples/pevi_adroit.py --env pen-cloned-v0 --gpu 0

For running PBRL-Prior on the Adroit environments, run:

python examples/pevi_adroit.py --env pen-cloned-v0 --prior --gpu 0

The core implementation is given in d4rl/rlkit/torch/sac/pevi.py

Execution

The data for separate runs is stored on disk under the result directory with filename <env-id>-<timestamp>/<seed>/. Each run directory contains

  • debug.log Record the epoch, Q-value, Uncertainty-value, scores.
  • progress.csv Same data as debug.log but with csv format.
  • variant.json The hyper-parameters in training.
  • models The final actor-critic network.

The evaluation/d4rl score in debug.log or progress.csv records the normalized score in our paper.

In case of any questions, bugs, suggestions or improvements, please feel free to open an issue.