This code trains a reinforcement learning agent to play PacMan by using only the pixels on the screen.
This repository contains two models:
- A vanilla Deep Q-Network with experience replay
- An enhanced Deep Q-Network with experience replay, Double DQN weights and uses a Dueling architecture
The deep neural net is modeled in tensorflow and we use the Open AI Gym to generate the game simulation. We also use OpenAI baselines - which is a wrapper over tensorflow - for writing the model.
- Python 3.5
- numpy
- scikit-learn
- tensorflow==1.4.0
- Open AI Gym (See this link)
- Open AI Gym[atari] (See this link)
- baselines==0.1.4 (See this link. In case of installation issues, read the section below)
- Open AI baselines needs mujoco-py installed which in-turn needs Mujoco 1.5 which is a proprietary software. You may be able to install baselines without running into any issues with mujoco-py but if the install of baselines fails dues to mujoco-py, then you need to install Mujoco 1.5 first. The installation instructions are provided on OpenAI's mujoco-py GitHub repo. You need to install Mujoco and enter the activation code that can be found from Mujoco's website. This enables a 30 day trial.
- If you are facing problems in the installation of Mujoco 1.5, then mujoco-py==0.5.7 installation can also work but it needs an older version of baselines (0.1.3). This is a breaking change and the code provided in the solution needs some small modifications to make it work.
The code has three directories:
python_full_DQN
python_vanilla_DQN
training_logs
Both directories contain a pacman_agent.py
file which implements the RL agent, a save_model
directory which contains a pre-trained model and a logs
directory which contains execution logs in csv format.
The training_logs
directory contains some renamed log files for earlier training sessions that have recoreded the step count, episode count and the episode rewards for different values of hyperparameters.
To train, go into any one of the python_*_DQN
directories and open the file pacman_agent.py
. On the top section of the file, you can set any hyperparameters of the model. If you want to continue training from the last saved checkpoint, set the flag is_load_model=True
otherwise set it to False
. Then, run:
$ python pacman_agent.py
This will commence training of the agent. If a saved model checkpoint is loaded, the training will continue from where it left off. Otherwise, the training will begin from scratch. Make sure to change NUM_STEPS
variable to a large enough number so that the training doesn't stop before you want it to, especially if you're loading an old checkpoint.
(Note: It takes upto 15 hours of training for Full DQN and upto 6 hours of training for vanilla DQN to complete 2,000,000 time steps on a machine with an Nvidia 860m Grapics card, an 8GB RAM and a Core i7 CPU. The training can be continued for far longer than 2 million time steps for better results).
In either of the python_*_DQN
directories, open the pacman_agent.py
file and the set the flag watch_train=True
and is_load_model=True
. This will load the pretrained model in memory and play the game based on the the Q-values predicted by the model. This will also keep training the model as the more games are played so that the model can improve in the background.
Both python_*_DQN
directories contain a CSV file, called play_500_eps.csv
which contains reward data from playing PacMan for 500 episodes. This data is then used to calculate metrics to compare the two models.
The progress made on training or playing the game can be visualized using the CSV log files. These files store three key data points: the number of time steps, the number of episodes passed and the reward per episode. The easiest visualization is to make a line plot of the number of steps vs the episode reward. This graph is usually very jittery therefore taking a moving average over a fixed number of steps can be helpful in visualizing long term trends.