Implementation of Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model by DeepMind
for CartPole-v0 environment.
MuZero + naive tree search is working.
MuZero + monte carlo tree search (MCTS) is now working.
Improvements: more tricks/hacks for better MCTS training.
Search in the fully expanded tree at depth n the maximum discounted value (+ discounted rewards).
Take the action which is the first action from the root to the maximum node.