The PPO balancer is a feedforward neural network policy trained by reinforcement learning with a sim-to-real pipeline. Like the MPC balancer and PID balancer, it balances Upkie with straight legs. Training uses the UpkieGroundVelocity
gym environment and the PPO implementation from Stable Baselines3.
An overview video of the training pipeline is given in this video: Sim-to-real RL pipeline for Upkie wheeled bipeds.
conda env create -f environment.yaml
conda activate ppo_balancer
To run the default policy:
make test_policy
Here we assumed the spine is already up and running, for instance by running ./start_simulation.sh
on your machine, or by starting a pi3hat spine on the robot.
To run a policy saved to a custom path, use for instance:
python ppo_balancer/run.py --policy ppo_balancer/training/2023-11-15/final.zip
Upload the agent repository to the robot:
make upload
Then, SSH into the robot and run the following target:
$ ssh your-upkie
user@your-upkie:~$ python ppo_balancer/run.py
This will run the policy saved at the default path. To run a custom policy, save its ZIP file to the robot (save its operative config as well for your future reference) and pass it path as argument to run.py
.
First, check that training progresses one rollout at a time:
make train_and_show
Once this works you can train for real, with more environments and no GUI:
make train
Check out the time/fps
plots in the command line or in TensorBoard to adjust the number of parallel environments:
make tensorboard
You should increase the number of environments from the default value (NB_TRAINING_ENVS
in the Makefile) to "as much as you can as long as FPS keeps going up".
PPO balancer uses pixi-pack
to export a pixi environment to your Upkie. If you don't have it yet, you can install pixi from here.
First, create an environment.tar
file with the following command:
pixi run pack-to-upkie
Then, upload it to your Upkie and unpack it by:
pixi-pack unpack environment.tar
If pixi-pack
is not installed on your Upkie, you can get a pixi-pack-aarch64-unknown-linux-gnu
binary from the pixi-pack release page. Finally, activate the environment and run the agent:
source ./activate.sh
python ppo_balancer/run.py
Symptom: you are getting errors related to PyTorch not finding shared object files, with a call to _preload_cuda_deps()
somewhere in the traceback:
File ".../torch/__init__.py", line 178, in _load_global_deps
_preload_cuda_deps()
File ".../torch/__init__.py", line 158, in _preload_cuda_deps
ctypes.CDLL(cublas_path)
File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
OSError: .../nvidia/cublas/lib/libcublas.so.11: cannot open shared object file: No such file or directory
Workaround: pip install torch
in your local pip environment. This will override Bazel's and allow you to train and run normally.