-
Notifications
You must be signed in to change notification settings - Fork 12
MPOPIS Details
Welcome to the MPOPIS wiki!
This wiki page contains simulation details and parameters used for the algorithms.
The goal of the MountainCar problem is to get an under-powered car up a hill to a goal location. The action space consisted of a single continuous action at each time step. The MountainCar environment used was the ReinforcementLearning.jl environment, which was based on OpenAI Gym's MountainCar scenario.
The reward function for this problem was modified to add an incentive to go faster and an indicator variable for reaching the goal location with a positive velocity. The environment terminated when the car reached the goal location or at 200 steps.
# Modified MountainCar reward function
function RLBase.reward(env::MountainCarEnv{A,T}) where {A,T}
rew = 0.0
if env.state[1] >= env.params.goal_pos &&
env.state[2] >= env.params.goal_velocity
rew += 100000
end
rew += abs(env.state[2])
rew += env.done ? 0.0 : -1.0
return rew
end
The car racing environment has the option to be run on multiple tracks. The default track and the one used in the simulations is a 1.18 km track with a lane width of 15 m. The track is shown below. All scenarios started with car 1 at the origin with other cars offset left and right by 5 m. Each car was oriented toward the positive y-axis and had a longitudinal velocity of 10 m/s at t=0. The parameters used for the car model and the dynamics can be seen in the code here. The model parameters and dynamics were implemented from Brown and Gerdes (2020) and Subosits and Gerdes (2021).
Parameters | :mppi |
:PMCMPPI |
:μaismppi |
:μΣaismppi |
:cemppi |
:cmamppi |
---|---|---|---|---|---|---|
Samples | 20-180 (multiples of 20) | 20 | 20 | 20 | 20 | 20 |
Horizon | 15 | 15 | 15 | 15 | 15 | 15 |
Inverse Temp (λ) | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 |
Control Cost Param (α) | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
Init Control Sequence | 0.0, ..., 0.0 | 0.0, ..., 0.0 | 0.0, ..., 0.0 | 0.0, ..., 0.0 | 0.0, ..., 0.0 | 0.0, ..., 0.0 |
Control Covariance | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 |
AIS Iterations | --- | 1-8 | 1-8 | 1-8 | 1-8 | 1-8 |
AIS Inv Temp (λ_ais) | --- | 0.1 | 0.1 | 0.1 | --- | --- |
CE Elite Threshold | --- | --- | --- | --- | 0.8 | --- |
CE Σ Estimation Method | --- | --- | --- | --- | :mle |
--- |
CMA Step FActor (σ) | --- | --- | --- | --- | --- | 0.5 |
CMA Elite Threshold | --- | --- | --- | --- | --- | 0.8 |
Parameters | :mppi |
:PMCMPPI |
:μaismppi |
:μΣaismppi |
:cemppi |
:cmamppi |
---|---|---|---|---|---|---|
Samples | 375-2250 (multiples of 375) | 375 | 375 | 375 | 375 | 375 |
Horizon | 50 | 50 | 50 | 50 | 50 | 50 |
Inverse Temp (λ) | 10.0 | 10.0 | 10.0 | 10.0 | 10.0 | 10.0 |
Control Cost Param (α) | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
Init Control Sequence | 0.0, ..., 0.0 | 0.0, ..., 0.0 | 0.0, ..., 0.0 | 0.0, ..., 0.0 | 0.0, ..., 0.0 | 0.0, ..., 0.0 |
Control Covariance | [0.0625 0; 0 0.1] | [0.0625 0; 0 0.1] | [0.0625 0; 0 0.1] | [0.0625 0; 0 0.1] | [0.0625 0; 0 0.1] | [0.0625 0; 0 0.1] |
AIS Iterations | --- | 1-6 | 1-6 | 1-6 | 1-6 | 1-6 |
AIS Inv Temp (λ_ais) | --- | 20.0 | 20.0 | 20.0 | --- | --- |
CE Elite Threshold | --- | --- | --- | --- | 0.8 | --- |
CE Σ Estimation Method | --- | --- | --- | --- | :mle |
--- |
CMA Step FActor (σ) | --- | --- | --- | --- | --- | 0.5 |
CMA Elite Threshold | --- | --- | --- | --- | --- | 0.8 |
Parameters | :mppi |
:PMCMPPI |
:μaismppi |
:μΣaismppi |
:cemppi |
:cmamppi |
---|---|---|---|---|---|---|
Samples | 375-2250 (multiples of 375) | 375 | 375 | 375 | 375 | 375 |
Horizon | 50 | 50 | 50 | 50 | 50 | 50 |
Inverse Temp (λ) | 10.0 | 10.0 | 10.0 | 10.0 | 10.0 | 10.0 |
Control Cost Param (α) | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
Init Control Sequence | 0.0, ..., 0.0 | 0.0, ..., 0.0 | 0.0, ..., 0.0 | 0.0, ..., 0.0 | 0.0, ..., 0.0 | 0.0, ..., 0.0 |
Control Covariance | [0.0625 0; 0 0.1] | [0.0625 0; 0 0.1] | [0.0625 0; 0 0.1] | [0.0625 0; 0 0.1] | [0.0625 0; 0 0.1] | [0.0625 0; 0 0.1] |
AIS Iterations | --- | 1-6 | 1-6 | 1-6 | 1-6 | 1-6 |
AIS Inv Temp (λ_ais) | --- | 70.0 | 70.0 | 70.0 | --- | --- |
CE Elite Threshold | --- | --- | --- | --- | 0.8 | --- |
CE Σ Estimation Method | --- | --- | --- | --- | :mle |
--- |
CMA Step FActor (σ) | --- | --- | --- | --- | --- | 0.5 |
CMA Elite Threshold | --- | --- | --- | --- | --- | 0.8 |
Parameters | :mppi |
:PMCMPPI |
:μaismppi |
:μΣaismppi |
:cemppi |
:cmamppi |
---|---|---|---|---|---|---|
Samples | 375-2250 (multiples of 375) | 375 | 375 | 375 | 150 | 150 |
Horizon | 50 | 50 | 50 | 50 | 50 | 50 |
Inverse Temp (λ) | 10.0 | 10.0 | 10.0 | 10.0 | 10.0 | 10.0 |
Control Cost Param (α) | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
Init Control Sequence | 0.0, ..., 0.0 | 0.0, ..., 0.0 | 0.0, ..., 0.0 | 0.0, ..., 0.0 | 0.0, ..., 0.0 | 0.0, ..., 0.0 |
Control Covariance | [0.0625 0; 0 0.1] | [0.0625 0; 0 0.1] | [0.0625 0; 0 0.1] | [0.0625 0; 0 0.1] | [0.0625 0; 0 0.1] | [0.0625 0; 0 0.1] |
AIS Iterations | --- | 1-6 | 1-6 | 1-6 | 1, 5, 7, ..., 15 | 1, 5, 7, ..., 15 |
AIS Inv Temp (λ_ais) | --- | 70.0 | 70.0 | 70.0 | --- | --- |
CE Elite Threshold | --- | --- | --- | --- | 0.8 | --- |
CE Σ Estimation Method | --- | --- | --- | --- | :ss |
--- |
CMA Step FActor (σ) | --- | --- | --- | --- | --- | 0.5 |
CMA Elite Threshold | --- | --- | --- | --- | --- | 0.8 |
The numbers presented in the paper used terminate_when_unhealthy=False
for Ant-v4.
Parameters | :mppi |
:cemppi |
---|---|---|
Samples | 250, 500, 1000, 1500, 3000 | 50, 100, 125, 150, 200 |
Horizon | 50 | 50 |
Inverse Temp (λ) | 1.0 | 1.0 |
Control Cost Param (α) | 1.0 | 1.0 |
Init Control Sequence | 0.0, ..., 0.0 | 0.0, ..., 0.0 |
Control Covariance | I(6) * 0.25 | I(6) * 0.25 |
AIS Iterations | --- | 5, 5, 8, 10, 15 |
CE Elite Threshold | --- | 0.8 |
CE Σ Estimation Method | --- | :ss |
For :PMCMPPI
, :μaismppi
, and :μΣaismppi
, the covariance was estimated with the maximum likelihood estimator (:mle
). The CE method used the :mle
method and :ss
method which is a the Schaffer & Strimmer shrinkage estimator. Different covariance estimation techniques were implemented through the integration of CovarianceEstimation.jl. The methods tested were
-
:mle
= maximum liklihood estimation -
:lw
= Lediot & Wolf (http://www.ledoit.net/honey.pdf) -
:ss
= Schaffer & Strimmer (https://strimmerlab.github.io/) -
:rblw
= Rao-Blackwell estimator (https://arxiv.org/pdf/0907.4698.pdf) -
:oas
= Oracle-Approximating (https://arxiv.org/pdf/0907.4698.pdf)
For a given effective sample size, the CE and CMA methods benefited from using fewer samples with more iterations as the number of cars increased. However, as the sample size decreased, the method to approximate the covariance matrix in the CE version of MPOPI became more important. Below are two gifs of the CMA version of MPOPI. One is with 4 iterations of 375 samples using the :mle
and the other is with 10 iterations of 150 samples using :ss
to estimate the covariance matrix.