From 3c1bb45235451c0478f57b50917fc1bbaa7094ed Mon Sep 17 00:00:00 2001 From: Jason Pekos <78766845+JasonPekos@users.noreply.github.com> Date: Wed, 14 Aug 2024 18:31:30 -0400 Subject: [PATCH 1/4] add new section in HMM tutorial --- Manifest.toml | 12 ++- Project.toml | 1 + tutorials/04-hidden-markov-model/index.qmd | 100 +++++++++++++++++++-- 3 files changed, 105 insertions(+), 8 deletions(-) diff --git a/Manifest.toml b/Manifest.toml index 6f8f1feae..ce4aaf2a7 100644 --- a/Manifest.toml +++ b/Manifest.toml @@ -2,7 +2,7 @@ julia_version = "1.10.4" manifest_format = "2.0" -project_hash = "4dbb593d041b187abd261fc4e92a2df5baf55b80" +project_hash = "8d5f9e89ceaa6c8c6e8fc7a356f951cdb7eacffe" [[deps.ADTypes]] git-tree-sha1 = "016833eb52ba2d6bea9fcb50ca295980e728ee24" @@ -1136,6 +1136,16 @@ git-tree-sha1 = "129acf094d168394e80ee1dc4bc06ec835e510a3" uuid = "2e76f6c2-a576-52d4-95c1-20adfe4de566" version = "2.8.1+1" +[[deps.HiddenMarkovModels]] +deps = ["ArgCheck", "ChainRulesCore", "DensityInterface", "DocStringExtensions", "FillArrays", "LinearAlgebra", "Random", "SparseArrays", "StatsAPI", "StatsFuns"] +git-tree-sha1 = "f5f0f6e33b21487d39bcdfb6d67aa4c5e54faba3" +uuid = "84ca31d5-effc-45e0-bfda-5a68cd981f47" +version = "0.5.3" +weakdeps = ["Distributions"] + + [deps.HiddenMarkovModels.extensions] + HiddenMarkovModelsDistributionsExt = "Distributions" + [[deps.HostCPUFeatures]] deps = ["BitTwiddlingConvenienceFunctions", "IfElse", "Libdl", "Static"] git-tree-sha1 = "8e070b599339d622e9a081d17230d74a5c473293" diff --git a/Project.toml b/Project.toml index cdfa84dd5..fbac0fa92 100644 --- a/Project.toml +++ b/Project.toml @@ -18,6 +18,7 @@ Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c" ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210" Functors = "d9f16b24-f501-4c13-a1f2-28368ffc5196" GLM = "38e38edf-8417-5370-95a0-9cbb8c7f171a" +HiddenMarkovModels = "84ca31d5-effc-45e0-bfda-5a68cd981f47" LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f" LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e" LogDensityProblems = "6fdf6af0-433a-55f7-b3ed-c6c6e0b8df7c" diff --git a/tutorials/04-hidden-markov-model/index.qmd b/tutorials/04-hidden-markov-model/index.qmd index 38004e044..bae24949b 100755 --- a/tutorials/04-hidden-markov-model/index.qmd +++ b/tutorials/04-hidden-markov-model/index.qmd @@ -10,17 +10,17 @@ using Pkg; Pkg.instantiate(); ``` -This tutorial illustrates training Bayesian [Hidden Markov Models](https://en.wikipedia.org/wiki/Hidden_Markov_model) (HMM) using Turing. The main goals are learning the transition matrix, emission parameter, and hidden states. For a more rigorous academic overview on Hidden Markov Models, see [An introduction to Hidden Markov Models and Bayesian Networks](http://mlg.eng.cam.ac.uk/zoubin/papers/ijprai.pdf) (Ghahramani, 2001). +This tutorial illustrates training Bayesian [Hidden Markov Models](https://en.wikipedia.org/wiki/Hidden_Markov_model) (HMM) using Turing. The main goals are learning the transition matrix, emission parameter, and hidden states. For a more rigorous academic overview of Hidden Markov Models, see [An introduction to Hidden Markov Models and Bayesian Networks](http://mlg.eng.cam.ac.uk/zoubin/papers/ijprai.pdf) (Ghahramani, 2001). In this tutorial, we assume there are $k$ discrete hidden states; the observations are continuous and normally distributed - centered around the hidden states. This assumption reduces the number of parameters to be estimated in the emission matrix. -Let's load the libraries we'll need. We also set a random seed (for reproducibility) and the automatic differentiation backend to forward mode (more [here](https://turinglang.org/dev/docs/using-turing/autodiff) on why this is useful). +Let's load the libraries we'll need, and set a random seed for reproducibility. ```{julia} # Load libraries. using Turing, StatsPlots, Random -# Set a random seed and use the forward_diff AD mode. +# Set a random seed Random.seed!(12345678); ``` @@ -29,6 +29,9 @@ Random.seed!(12345678); In this example, we'll use something where the states and emission parameters are straightforward. ```{julia} +#| code-fold: true +#| code-summary: "Load and plot data for this tutorial." + # Define the emission parameter. y = [ 1.0, @@ -66,7 +69,8 @@ N = length(y); K = 3; # Plot the data we just made. -plot(y; xlim=(0, 30), ylim=(-1, 5), size=(500, 250)) +plot(y; xlim=(0, 30), ylim=(-1, 5), size=(500, 250), legend = false) +scatter!(y, color = :blue; xlim=(0, 30), ylim=(-1, 5), size=(500, 250), legend = false) ``` We can see that we have three states, one for each height of the plot (1, 2, 3). This height is also our emission parameter, so state one produces a value of one, state two produces a value of two, and so on. @@ -74,8 +78,8 @@ We can see that we have three states, one for each height of the plot (1, 2, 3). Ultimately, we would like to understand three major parameters: 1. The transition matrix. This is a matrix that assigns a probability of switching from one state to any other state, including the state that we are already in. - 2. The emission matrix, which describes a typical value emitted by some state. In the plot above, the emission parameter for state one is simply one. - 3. The state sequence is our understanding of what state we were actually in when we observed some data. This is very important in more sophisticated HMM models, where the emission value does not equal our state. + 2. The emission parameters, which describes a typical value emitted by some state. In the plot above, the emission parameter for state one is simply one. + 3. The state sequence is our understanding of what state we were actually in when we observed some data. This is very important in more sophisticated HMMs, where the emission value does not equal our state. With this in mind, let's set up our model. We are going to use some of our knowledge as modelers to provide additional information about our system. This takes the form of the prior on our emission parameter. @@ -131,6 +135,7 @@ Time to run our sampler. ```{julia} #| output: false +#| echo: false setprogress!(false) ``` @@ -190,4 +195,85 @@ stationary. We can use the diagnostic functions provided by [MCMCChains](https:/ heideldiag(MCMCChains.group(chn, :T))[1] ``` -The p-values on the test suggest that we cannot reject the hypothesis that the observed sequence comes from a stationary distribution, so we can be reasonably confident that our transition matrix has converged to something reasonable. \ No newline at end of file +The p-values on the test suggest that we cannot reject the hypothesis that the observed sequence comes from a stationary distribution, so we can be reasonably confident that our transition matrix has converged to something reasonable. + +## Efficient Inference With The Forward Algorithm + +While the above method works well for the simple example in this tutorial, some users may desire a more efficient method, especially when their model is more complicated. One simple way to improve inference is to marginalize out the hidden states of the model with an appropriate algorithm, calculating only the posterior over the continuous random variables. Not only does this allow more efficient inference via Rao-Blackwellization, but now we can sample our model with `NUTS()` alone, which is usually a much more performant MCMC kernel. + +Thankfully, [HiddenMarkovModels.jl](https://github.com/gdalle/HiddenMarkovModels.jl) provides an extremely efficient implementation of many algorithms related to Hidden Markov Models. This allows us to re-write our model as: + +```{julia} +#| output: false +using HiddenMarkovModels +using FillArrays +using LinearAlgebra +using LogExpFunctions + + +@model function BayesHmm2(y, K) + m ~ Bijectors.ordered(MvNormal([1.0, 2.0, 3.0], 0.5I)) + T ~ filldist(Dirichlet(Fill(1/K, K)), K) + + hmm = HMM(softmax(ones(K)), copy(T'), [Normal(m[i], 0.1) for i in 1:K]) + Turing.@addlogprob! logdensityof(hmm, y) +end + +chn2 = sample(BayesHmm2(y, 3), NUTS(), 1000) +``` + + +We can compare the chains of these two models, confirming the posterior estimate is similar (modulo label switching concerns with the Gibbs model): +```{julia} +#| code-fold: true +#| code-summary: "Plotting Chains" + +plot(chn["m[1]"], label = "m[1], Model 1, Gibbs", color = :lightblue) +plot!(chn2["m[1]"], label = "m[1], Model 2, NUTS", color = :blue) +plot!(chn["m[2]"], label = "m[2], Model 1, Gibbs", color = :pink) +plot!(chn2["m[2]"], label = "m[2], Model 2, NUTS", color = :red) +plot!(chn["m[3]"], label = "m[3], Model 1, Gibbs", color = :yellow) +plot!(chn2["m[3]"], label = "m[3], Model 2, NUTS", color = :orange) +``` + + +### Recovering Marginalized Trajectories + +We can use the `viterbi()` algorithm, also from the `HiddenMarkovModels` package, to recover the most probable state for each parameter set in our posterior sample: +```{julia} +#| output: false +@model function BayesHmmRecover(y, K, IncludeGenerated = false) + + m ~ Bijectors.ordered(MvNormal([1.0, 2.0, 3.0], 0.5I)) + T ~ filldist(Dirichlet(Fill(1/K, K)), K) + + hmm = HMM(softmax(ones(K)), copy(T'), [Normal(m[i], 0.1) for i in 1:K]) + Turing.@addlogprob! logdensityof(hmm, y) + + # Conditional generation of the hidden states. + if IncludeGenerated + seq, _ = viterbi(hmm, y) + s := [m[s] for s in seq] + else + return nothing + end +end + +chn_recover = sample(BayesHmmRecover(y, 3, true), NUTS(), 1000) +``` + +Plotting the estimated states, we can see that the results align well with our expectations: + +```{julia} +#| code-fold: true +#| code-summary: "HMM Plotting Functions" + +p = plot(xlim=(0, 30), ylim=(-1, 5), size=(500, 250)) +for i in 1:100 + ind = rand(DiscreteUniform(1, 1000)) + plot!(MCMCChains.group(chn_recover, :s).value[ind,:], color = :grey, opacity = 0.1, legend = :false) +end +scatter!(y, color = :blue) + +p +``` From ed78ebdaab8cabb8df9426e9ff6c6e2c61fe8e2c Mon Sep 17 00:00:00 2001 From: Jason Pekos <78766845+JasonPekos@users.noreply.github.com> Date: Sat, 28 Sep 2024 13:41:00 -0700 Subject: [PATCH 2/4] Update tutorials/04-hidden-markov-model/index.qmd Co-authored-by: Penelope Yong --- tutorials/04-hidden-markov-model/index.qmd | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tutorials/04-hidden-markov-model/index.qmd b/tutorials/04-hidden-markov-model/index.qmd index bae24949b..5fb1c4774 100755 --- a/tutorials/04-hidden-markov-model/index.qmd +++ b/tutorials/04-hidden-markov-model/index.qmd @@ -10,7 +10,9 @@ using Pkg; Pkg.instantiate(); ``` -This tutorial illustrates training Bayesian [Hidden Markov Models](https://en.wikipedia.org/wiki/Hidden_Markov_model) (HMM) using Turing. The main goals are learning the transition matrix, emission parameter, and hidden states. For a more rigorous academic overview of Hidden Markov Models, see [An introduction to Hidden Markov Models and Bayesian Networks](http://mlg.eng.cam.ac.uk/zoubin/papers/ijprai.pdf) (Ghahramani, 2001). +This tutorial illustrates training Bayesian [hidden Markov models](https://en.wikipedia.org/wiki/Hidden_Markov_model) (HMMs) using Turing. +The main goals are learning the transition matrix, emission parameter, and hidden states. +For a more rigorous academic overview of hidden Markov models, see [An Introduction to Hidden Markov Models and Bayesian Networks](https://mlg.eng.cam.ac.uk/zoubin/papers/ijprai.pdf) (Ghahramani, 2001). In this tutorial, we assume there are $k$ discrete hidden states; the observations are continuous and normally distributed - centered around the hidden states. This assumption reduces the number of parameters to be estimated in the emission matrix. From 2608ddc056d574a1bf6a40bbb8ce78db2a039da5 Mon Sep 17 00:00:00 2001 From: Jason Pekos <78766845+JasonPekos@users.noreply.github.com> Date: Sat, 28 Sep 2024 13:42:21 -0700 Subject: [PATCH 3/4] Update tutorials/04-hidden-markov-model/index.qmd Co-authored-by: Penelope Yong --- tutorials/04-hidden-markov-model/index.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tutorials/04-hidden-markov-model/index.qmd b/tutorials/04-hidden-markov-model/index.qmd index 5fb1c4774..63ffed96b 100755 --- a/tutorials/04-hidden-markov-model/index.qmd +++ b/tutorials/04-hidden-markov-model/index.qmd @@ -203,7 +203,7 @@ The p-values on the test suggest that we cannot reject the hypothesis that the o While the above method works well for the simple example in this tutorial, some users may desire a more efficient method, especially when their model is more complicated. One simple way to improve inference is to marginalize out the hidden states of the model with an appropriate algorithm, calculating only the posterior over the continuous random variables. Not only does this allow more efficient inference via Rao-Blackwellization, but now we can sample our model with `NUTS()` alone, which is usually a much more performant MCMC kernel. -Thankfully, [HiddenMarkovModels.jl](https://github.com/gdalle/HiddenMarkovModels.jl) provides an extremely efficient implementation of many algorithms related to Hidden Markov Models. This allows us to re-write our model as: +Thankfully, [HiddenMarkovModels.jl](https://github.com/gdalle/HiddenMarkovModels.jl) provides an extremely efficient implementation of many algorithms related to hidden Markov models. This allows us to rewrite our model as: ```{julia} #| output: false From b4143061ef80d494ca67e2842873bc82043ede71 Mon Sep 17 00:00:00 2001 From: Jason Pekos <78766845+JasonPekos@users.noreply.github.com> Date: Sat, 28 Sep 2024 13:42:30 -0700 Subject: [PATCH 4/4] Update tutorials/04-hidden-markov-model/index.qmd Co-authored-by: Penelope Yong --- tutorials/04-hidden-markov-model/index.qmd | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tutorials/04-hidden-markov-model/index.qmd b/tutorials/04-hidden-markov-model/index.qmd index 63ffed96b..171b36c57 100755 --- a/tutorials/04-hidden-markov-model/index.qmd +++ b/tutorials/04-hidden-markov-model/index.qmd @@ -201,7 +201,9 @@ The p-values on the test suggest that we cannot reject the hypothesis that the o ## Efficient Inference With The Forward Algorithm -While the above method works well for the simple example in this tutorial, some users may desire a more efficient method, especially when their model is more complicated. One simple way to improve inference is to marginalize out the hidden states of the model with an appropriate algorithm, calculating only the posterior over the continuous random variables. Not only does this allow more efficient inference via Rao-Blackwellization, but now we can sample our model with `NUTS()` alone, which is usually a much more performant MCMC kernel. +While the above method works well for the simple example in this tutorial, some users may desire a more efficient method, especially when their model is more complicated. +One simple way to improve inference is to marginalize out the hidden states of the model with an appropriate algorithm, calculating only the posterior over the continuous random variables. +Not only does this allow more efficient inference via Rao-Blackwellization, but now we can sample our model with `NUTS()` alone, which is usually a much more performant MCMC kernel. Thankfully, [HiddenMarkovModels.jl](https://github.com/gdalle/HiddenMarkovModels.jl) provides an extremely efficient implementation of many algorithms related to hidden Markov models. This allows us to rewrite our model as: