Policy gradient based RL algorithms
- REINFORCE
- REINFORCE with baseline
- One-step Actor Critic
- Advantage Actor Critic (A2C)
- Proximal Policy Optimization (PPO)
CartPole-v1 env document : https://www.gymlibrary.dev/environments/classic_control/cart_pole/
This codes(.ipynb) are executable in Google Colab.