Skip to content

Latest commit

 

History

History
60 lines (48 loc) · 2.11 KB

readme.md

File metadata and controls

60 lines (48 loc) · 2.11 KB

HotWheelsRL (wip)

Zack Beucler

  • Use RL to train an agent to competitively complete a race on the first level in the GBA game 'Hot Wheels Stunt Track Challenge'
  • Agent should be be able to complete a lap

Resources

discrete multidiscrete multibinary
PPO
A2C
DQN
HER
QR-DQN
RecurrentPPO
TRPO
Maskable PPO
ARS

Reward function

  • math is probably formatted wrong but idc
  • speed reward:
    • +/- 0.1 if mean speed increases/decreases
$$\sum_{i=1}^{n} \delta progress$$
  • n : Total time steps in episode
  • In my mind, this should encourage the bot to make forward progress and score points

Experimental reward function

  • train 3 laps
  • +10 for completing a lap
  • +0.1 or +0.01 for increasing speed
  • bigger score reward

Hyperparameters

learning_rate=2.5e-4,
n_steps=128,
n_epochs=3,
batch_size=32,
ent_coef=0.01,
vf_coef=1.0,
num_envs=8