Skip to content
generated from upkie/new_agent

Train a balancing policy for Upkie by reinforcement learning

License

Notifications You must be signed in to change notification settings

upkie/ppo_balancer

Repository files navigation

PPO balancer

upkie

The PPO balancer is a feedforward neural network policy trained by reinforcement learning with a sim-to-real pipeline. Like the MPC balancer and PID balancer, it balances Upkie with straight legs. Training uses the UpkieGroundVelocity gym environment and the PPO implementation from Stable Baselines3.

An overview video of the training pipeline is given in this video: Sim-to-real RL pipeline for Upkie wheeled bipeds.

Installation

On your machine

conda env create -f environment.yaml
conda activate ppo_balancer

On your Upkie

The PPO balancer uses pixi and pixi-pack to pack a standalone Python environment to run policies on your Upkie. First, create environment.tar and upload it by:

make pack_pixi_env
make upload

Then, unpack the remote environment:

$ ssh user@your-upkie
user@your-upkie:~$ cd ppo_balancer
user@your-upkie:ppo_balancer$ make unpack_pixi_env

Running a policy

On your machine

To run the default policy:

make run_agent

Here we assumed the spine is already up and running, for instance by running ./start_simulation.sh on your machine, or by starting a pi3hat spine on the robot.

To run a policy saved to a custom path, use for instance:

python ppo_balancer/run.py --policy ppo_balancer/training/2023-11-15/final.zip

On your Upkie

Once the agent and Python environment have been uploaded with the instructions above, you can SSH into the robot and run the same target:

$ ssh user@your-upkie
user@your-upkie:~$ make run_agent

This will run the policy saved at the default path. To run a custom policy, save its ZIP file to the robot (save its operative config as well for your future reference) and pass it path as argument to run.py.

Training a new policy

First, check that training progresses one rollout at a time:

make train_and_show

Once this works you can train for real, with more environments and no GUI:

make train

Check out the time/fps plots in the command line or in TensorBoard to adjust the number of parallel environments:

make tensorboard

You should increase the number of environments from the default value (NB_TRAINING_ENVS in the Makefile) to "as much as you can as long as FPS keeps going up".

See also