MPOPIS Details

Welcome to the MPOPIS wiki!

MPOPIS Simulation Details and Algorithm Parameter Settings

This wiki page contains simulation details and parameters used for the algorithms.

Simulation Details

MountainCar Environment

The goal of the MountainCar problem is to get an under-powered car up a hill to a goal location. The action space consisted of a single continuous action at each time step. The MountainCar environment used was the ReinforcementLearning.jl environment, which was based on OpenAI Gym's MountainCar scenario.

The reward function for this problem was modified to add an incentive to go faster and an indicator variable for reaching the goal location with a positive velocity. The environment terminated when the car reached the goal location or at 200 steps.

# Modified MountainCar reward function
function RLBase.reward(env::MountainCarEnv{A,T}) where {A,T} 
    rew = 0.0
    if env.state[1] >= env.params.goal_pos && 
        env.state[2] >= env.params.goal_velocity
        rew += 100000
    end
    rew += abs(env.state[2])
    rew += env.done ? 0.0 : -1.0
    return rew
end

Car Racing Environment

The car racing environment has the option to be run on multiple tracks. The default track and the one used in the simulations is a 1.18 km track with a lane width of 15 m. The track is shown below. All scenarios started with car 1 at the origin with other cars offset left and right by 5 m. Each car was oriented toward the positive y-axis and had a longitudinal velocity of 10 m/s at t=0. The parameters used for the car model and the dynamics can be seen in the code here. The model parameters and dynamics were implemented from Brown and Gerdes (2020) and Subosits and Gerdes (2021).

Algorithm Details

MountainCar Environment

Parameters	`:mppi`	`:PMCMPPI`	`:μaismppi`	`:μΣaismppi`	`:cemppi`	`:cmamppi`
Samples	20-180 (multiples of 20)	20	20	20	20	20
Horizon	15	15	15	15	15	15
Inverse Temp (λ)	0.1	0.1	0.1	0.1	0.1	0.1
Control Cost Param (α)	1.0	1.0	1.0	1.0	1.0	1.0
Init Control Sequence	0.0, ..., 0.0	0.0, ..., 0.0	0.0, ..., 0.0	0.0, ..., 0.0	0.0, ..., 0.0	0.0, ..., 0.0
Control Covariance	1.5	1.5	1.5	1.5	1.5	1.5
AIS Iterations	---	1-8	1-8	1-8	1-8	1-8
AIS Inv Temp (λ_ais)	---	0.1	0.1	0.1	---	---
CE Elite Threshold	---	---	---	---	0.8	---
CE Σ Estimation Method	---	---	---	---	`:mle`	---
CMA Step FActor (σ)	---	---	---	---	---	0.5
CMA Elite Threshold	---	---	---	---	---	0.8

Car Racing Environment (1 Car)

Parameters	`:mppi`	`:PMCMPPI`	`:μaismppi`	`:μΣaismppi`	`:cemppi`	`:cmamppi`
Samples	375-2250 (multiples of 375)	375	375	375	375	375
Horizon	50	50	50	50	50	50
Inverse Temp (λ)	10.0	10.0	10.0	10.0	10.0	10.0
Control Cost Param (α)	1.0	1.0	1.0	1.0	1.0	1.0
Init Control Sequence	0.0, ..., 0.0	0.0, ..., 0.0	0.0, ..., 0.0	0.0, ..., 0.0	0.0, ..., 0.0	0.0, ..., 0.0
Control Covariance	[0.0625 0; 0 0.1]	[0.0625 0; 0 0.1]	[0.0625 0; 0 0.1]	[0.0625 0; 0 0.1]	[0.0625 0; 0 0.1]	[0.0625 0; 0 0.1]
AIS Iterations	---	1-6	1-6	1-6	1-6	1-6
AIS Inv Temp (λ_ais)	---	20.0	20.0	20.0	---	---
CE Elite Threshold	---	---	---	---	0.8	---
CE Σ Estimation Method	---	---	---	---	`:mle`	---
CMA Step FActor (σ)	---	---	---	---	---	0.5
CMA Elite Threshold	---	---	---	---	---	0.8

Car Racing Environment (2 Cars)

Parameters	`:mppi`	`:PMCMPPI`	`:μaismppi`	`:μΣaismppi`	`:cemppi`	`:cmamppi`
Samples	375-2250 (multiples of 375)	375	375	375	375	375
Horizon	50	50	50	50	50	50
Inverse Temp (λ)	10.0	10.0	10.0	10.0	10.0	10.0
Control Cost Param (α)	1.0	1.0	1.0	1.0	1.0	1.0
Init Control Sequence	0.0, ..., 0.0	0.0, ..., 0.0	0.0, ..., 0.0	0.0, ..., 0.0	0.0, ..., 0.0	0.0, ..., 0.0
Control Covariance	[0.0625 0; 0 0.1]	[0.0625 0; 0 0.1]	[0.0625 0; 0 0.1]	[0.0625 0; 0 0.1]	[0.0625 0; 0 0.1]	[0.0625 0; 0 0.1]
AIS Iterations	---	1-6	1-6	1-6	1-6	1-6
AIS Inv Temp (λ_ais)	---	70.0	70.0	70.0	---	---
CE Elite Threshold	---	---	---	---	0.8	---
CE Σ Estimation Method	---	---	---	---	`:mle`	---
CMA Step FActor (σ)	---	---	---	---	---	0.5
CMA Elite Threshold	---	---	---	---	---	0.8

Car Racing Environment (3+ Cars)

Parameters	`:mppi`	`:PMCMPPI`	`:μaismppi`	`:μΣaismppi`	`:cemppi`	`:cmamppi`
Samples	375-2250 (multiples of 375)	375	375	375	150	150
Horizon	50	50	50	50	50	50
Inverse Temp (λ)	10.0	10.0	10.0	10.0	10.0	10.0
Control Cost Param (α)	1.0	1.0	1.0	1.0	1.0	1.0
Init Control Sequence	0.0, ..., 0.0	0.0, ..., 0.0	0.0, ..., 0.0	0.0, ..., 0.0	0.0, ..., 0.0	0.0, ..., 0.0
Control Covariance	[0.0625 0; 0 0.1]	[0.0625 0; 0 0.1]	[0.0625 0; 0 0.1]	[0.0625 0; 0 0.1]	[0.0625 0; 0 0.1]	[0.0625 0; 0 0.1]
AIS Iterations	---	1-6	1-6	1-6	1, 5, 7, ..., 15	1, 5, 7, ..., 15
AIS Inv Temp (λ_ais)	---	70.0	70.0	70.0	---	---
CE Elite Threshold	---	---	---	---	0.8	---
CE Σ Estimation Method	---	---	---	---	`:ss`	---
CMA Step FActor (σ)	---	---	---	---	---	0.5
CMA Elite Threshold	---	---	---	---	---	0.8

HalfCheetah-v4 and Ant-v4

The numbers presented in the paper used terminate_when_unhealthy=False for Ant-v4.

Parameters	`:mppi`	`:cemppi`
Samples	250, 500, 1000, 1500, 3000	50, 100, 125, 150, 200
Horizon	50	50
Inverse Temp (λ)	1.0	1.0
Control Cost Param (α)	1.0	1.0
Init Control Sequence	0.0, ..., 0.0	0.0, ..., 0.0
Control Covariance	I(6) * 0.25	I(6) * 0.25
AIS Iterations	---	5, 5, 8, 10, 15
CE Elite Threshold	---	0.8
CE Σ Estimation Method	---	`:ss`

Covariance Estimation

For :PMCMPPI, :μaismppi, and :μΣaismppi, the covariance was estimated with the maximum likelihood estimator (:mle). The CE method used the :mle method and :ss method which is a the Schaffer & Strimmer shrinkage estimator. Different covariance estimation techniques were implemented through the integration of CovarianceEstimation.jl. The methods tested were

:mle = maximum liklihood estimation
:lw = Lediot & Wolf (http://www.ledoit.net/honey.pdf)
:ss = Schaffer & Strimmer (https://strimmerlab.github.io/)
:rblw = Rao-Blackwell estimator (https://arxiv.org/pdf/0907.4698.pdf)
:oas = Oracle-Approximating (https://arxiv.org/pdf/0907.4698.pdf)

Sample Size and Covariance Estimation Comparison

For a given effective sample size, the CE and CMA methods benefited from using fewer samples with more iterations as the number of cars increased. However, as the sample size decreased, the method to approximate the covariance matrix in the CE version of MPOPI became more important. Below are two gifs of the CMA version of MPOPI. One is with 4 iterations of 375 samples using the :mle and the other is with 10 iterations of 150 samples using :ss to estimate the covariance matrix.

MPOPI CMA - 375 Samples, 4 Iterations

MPOPI CMA - 150 Samples, 10 Iterations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPOPIS Details

MPOPIS Simulation Details and Algorithm Parameter Settings

Simulation Details

MountainCar Environment

Car Racing Environment

Algorithm Details

MountainCar Environment

Car Racing Environment (1 Car)

Car Racing Environment (2 Cars)

Car Racing Environment (3+ Cars)

HalfCheetah-v4 and Ant-v4

Covariance Estimation

Sample Size and Covariance Estimation Comparison

MPOPI CMA - 375 Samples, 4 Iterations

MPOPI CMA - 150 Samples, 10 Iterations

Clone this wiki locally