-
Notifications
You must be signed in to change notification settings - Fork 124
Pendulum environment
The environment simulates the behavior of an inverted pendulum. The theoretical system with its equations are as described in Barto et al. (1983):
- A cart of mass M that can move horizontally;
- A pole of mass m and length l attached to the cart, with θ in [0, -π] for the lefthand plane, and [0, π] for the righthand side. We are supposing that the cart is moving on a rail and the pole can go under it.
The goal of the agent is to balance the pole above its supporting cart (θ = 0), by displacing the cart left or right - thus, 2 actions are possible. To do so, the environment communicates to the agent:
- A vector (position, speed, angle, angular speed);
- The reward associated to the action chosen by the agent.
The work is done in pendulum_env.py
. The main focus is to implement act(self, action)
which specifies how the cart-pole system behaves in response to an input action. So first, we transcript the physical laws that rule the motion of the pole and the cart. The simulation timestep of the agent is DELTA_T = 0.02
second. But we discretize this value even further in act(self, action)
, in order to obtain dynamics that are closer to the exact differential equations.
Secondly, we chose the reward function as - abs(theta)
: the agent receives 0 when the pole is standing, and a negative reward proportional to the angle otherwise.
Here are the outputs of the agent after respectively 20 and 70 learning epochs, with 1000 steps in each. We clearly see the final success of the agent in controlling the inverted pendulum.
Note: a MP4 is generated every PERIOD_BTW_SUMMARY_PERFS
epochs and you need the FFmpeg library to do so. If you do not want to install this library or to generate the videos, just set PERIOD_BTW_SUMMARY_PERFS = -1
.