Implementation of (D)-DQN (1) (2) (3) by DeepMind.
Applied to the gym Breakout, Pong and SpaceInvaders environment. *NoFrameskip-v4
Due to computational resource constraints, i trained Breakout and SpaceInvaders for about 11-14 million steps.
The agents would become better given more training.
Training: SpaceInvaders
sparse rewards at ~(600-800) return. Often only one/two, fast moving targets left. Hard to optimize!