mdplearning

Install

devtools::install_github("boettiger-lab/mdplearning")

Basic Use

Use transition matrices for two different modesl in an example fisheries system:

library("mdplearning")
library("ggplot2")
library("dplyr")
library("tidyr")

source(system.file("examples/K_models.R", package="mdplearning"))
transition <- lapply(models, `[[`, "transition")

Use the reward matrix from the first model (reward function is known)

reward <- models[[1]][["reward"]]

Planning

Compute the optimal policy when planning over model uncertainty, without any adaptive learning. Default type is policy iteration. Default prior belief is a uniform belief over the models.

unif <- mdp_compute_policy(transition, reward, discount)

We can compare this policy to that of believing certainly in either model A or in model B:

lowK  <- mdp_compute_policy(transition, reward, discount, c(1,0))
highK <- mdp_compute_policy(transition, reward, discount, c(0,1))

We can plot the resulting policies. Note that uniform uncertainty policy is a compromise intermediate between low K and high K models.

dplyr::bind_rows(unif = unif, lowK = lowK, highK = highK, .id = "model") %>%
  ggplot(aes(state, state - policy, col = model)) + geom_line()

We can use mdp_planning to simulate (without learning) by specifying a fixed policy in advance. mdp_planning also permits us to include observation error in the simulation (though it is not accounted for by MDP optimization).

df <- mdp_planning(transition[[1]], reward, discount, x0 = 10, Tmax = 20, 
              policy = unif$policy, observation = models[[1]]$observation)



df %>% 
  select(-value) %>% 
  gather(series, stock, -time) %>% 
  ggplot(aes(time, stock, color = series)) + geom_line()

Learning

Given a transistion matrix from which the true transitions will be drawn, we can use Bayesian learning to update our belief as to which is the true model. Note that we must now specify a list of transition matrices representing the models under consideration, and separately specify the true transition. The function also now returns a list, which includes two data frames; one for the time series as before, and another showing the evolution of the posterior belief over models.

out <- mdp_learning(transition, reward, discount, x0 = 10, 
               Tmax = 20, true_transition = transition[[1]])

The final belief shows a strong convergence to model 1, which was used as the true model.

out$posterior[20,]
#>    V1           V2
#> 20  1 1.587274e-14

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
R		R
data-raw		data-raw
data		data
inst		inst
man		man
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
.travis.yml		.travis.yml
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README-fig1-1.png		README-fig1-1.png
README-fig2-1.png		README-fig2-1.png
README.Rmd		README.Rmd
README.md		README.md
appveyor.yml		appveyor.yml
codecov.yml		codecov.yml
cran-comments.md		cran-comments.md
mdplearning.Rproj		mdplearning.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mdplearning

Install

Basic Use

Planning

Learning

About

Releases 1

Packages

Languages

License

boettiger-lab/mdplearning

Folders and files

Latest commit

History

Repository files navigation

mdplearning

Install

Basic Use

Planning

Learning

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages