Agent Learning Framework (ALF) is a reinforcement learning framework emphasizing on the flexibility and easiness of implementing complex algorithms involving many different components. ALF is built on Tensorflow 2.1.
- A2C: OpenAI Baselines: ACKTR & A2C
- DDPG: Lillicrap et al. "Continuous control with deep reinforcement learning" arXiv:1509.02971
- PPO: Schulman et al. "Proximal Policy Optimization Algorithms" arXiv:1707.06347
- SAC: Haarnoja et al. "Soft Actor-Critic Algorithms and Applications" arXiv:1812.05905
- ICM: Pathak et al. "Curiosity-driven Exploration by Self-supervised Prediction" arXiv:1705.05363
- MERLIN: Wayne et al. "Unsupervised Predictive Memory in a Goal-Directed Agent"arXiv:1803.10760
- Amortized SVGD: Feng et al "Learning to Draw Samples with Amortized Stein Variational Gradient Descent"arXiv:1707.06626
- RND: Burda et al "Exploration by Random Network Distillation" arXiv:1810.12894
- MINE: Belghazi et al "Mutual Information Neural Estimation" arXiv:1801.04062
- DIAYN: Eysenbach et al "Diversity is All You Need: Learning Diverse Skills without a Reward Function" arXiv:1802.06070
- MISC: Zhao et al "Mutual Information-based State-Control for Intrinsically Motivated Reinforcement Learning" arXiv:2002.01963
You can run the following commands to install ALF
git clone https://github.com/HorizonRobotics/alf
cd alf
pip install -e .
All the examples below are trained on a single machine Intel(R) Core(TM) i9-7960X CPU @ 2.80GHz with 32 CPUs and one RTX 2080Ti GPU.
You can train model of the examples using the following command:
python -m alf.bin.train --gin_file=GIN_FILE --root_dir=LOG_DIR
GIN_FILE is the file of gin configuration. You can find sample gin configuration files for different tasks under directory alf/examples. LOG_DIR is the directory when you want to store the training results.
During training, you can use tensorboard to show the progress of training:
tensorboard --logdir=LOG_DIR
After training, you can visualize the trained model using the following command:
python -m alf.bin.play --root_dir=LOG_DIR
-
Cart pole. The training score took only 30 seconds to reach 200, using 8 environments.
-
Atari games. Need to install python package atari-py for atari game environments. The evaluation score (by taking argmax of the policy) took 1.5 hours to reach 800 on Breakout, using 64 environments.
-
Simple navigation with visual input. Follow the instruction at SocialRobot to install the environment.
-
PR2 grasping state only. Follow the instruction at SocialRobot to install the environment.
-
Humonoid. Learning to walk using the pybullet Humanoid environment. Need to install python pybullet>=2.5.0 for the environment. The training score took 1 hour 40 minutes to reach 2k, using asynchronous training with 2 actors (192 environments).
-
Super Mario. Playing Super Mario only using intrinsic reward. Python package gym-retro>=0.7.0 is required for this experiment and also a suitable
SuperMarioBros-Nes
rom should be obtained and imported (roms are not included in gym-retro). See this doc on how to import roms.
- Montezuma's Revenge. Training the hard exploration game Montezuma's Revenge with intrinsic rewards generated by RND. A lucky agent can get an episodic score of 6600 in 160M frames (40M steps with
frame_skip=4
). A normal agent would get an episodic score of 4000~6000 in the same number of frames. The training took about 6.5 hours with 128 parallel environments on a single GPU.
-
Pendulum. Learning diverse skills without external reward.
-
Collect Good Objects. Learn to collect good objects and avoid bad objects.
DeepmindLab
is required, Follow the instruction at DeepmindLab to install the environment.
-
Playground with a red ball and with two balls, a red ball and a blue ball. The agent learns to interact with the objects via the MI-based internal drive.