Solve some Unity ML-Agent environments using deep reinforcement learning.
Run code to reproduce the results: run_sac.py
Hyperparameters:
neural net = (400, 400)
discount factor = 0.99
learning rate = 0.001
1 - target net smoothing coefficient = 0.99
temperature parameter (alpha) = 0.05
batch size = 64
replay memory size = 20000
SAC algorithm references:
[1] Haarnoja, Tuomas, et al. "Soft actor-critic algorithms and applications." arXiv preprint arXiv:1812.05905 (2018).
[2] Achiam, Joshua. “Soft Actor-Critic¶.” Soft Actor-Critic - Spinning Up Documentation, spinningup.openai.com/en/latest/algorithms/sac.html.
- Download the Unity ML-Agent examples: https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Examples.md
- Build the Unity ML-Agent scene: https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Executable.md
- Wrap the scene as a gym-like environment using gym_unity: https://github.com/Unity-Technologies/ml-agents/tree/master/gym-unity
The current gym_unity
unfortunately doesn't support training multiple agents in parallel. To run multiple agents side by side for improving the sample collection efficiency, a workaround is to: 1. comment out all the number of agents checks self._check_agents
in the gym_unity/envs/__init__.py
; 2. allow the returns of all default observations and rewards in the _single_step()
function.
For example, by running 3 agents we can collect roughly three times more samples and improve the learning speed: