You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
%% TODO: Add description of paths, observations, sequences of labels, and Giovannis comment on The Baum Welch Algorithm.
\item$\mathcal{L}$ is a finite set of labels.
\item$\mathscr{l}: S \rightarrow D(\mathcal{L})$ is the emission function.
\item$\tau: S \rightarrow D(S)$ is the transition function.
\item$\pi\in D(S)$ is the initial distribution.
\end{itemize}
\end{definition}
Here, $D$ denotes the set of discrete probability distributions over a finite set.
The model emits a label $l$ in state $s$ with probability $\mathscr{l}(s)(l)$, transitions between states with probability $\tau(s)(s')$, and starts in state $s$ with probability $\pi(s)$.
%HMM as matrices\subsection{Matrix Representation of HMMs}\label{subsec:matrix-representation}
\glspl{hmm} can be represented using various matrices.
The emission function $\mathscr{l}$ can be represented as a matrix $\omega$ where $\omega_{s, l} = \mathscr{l}(s)(l)$.
The matrix $\omega$ has the size $|S| \times |\mathcal{L}|$.
The sum of each row in the matrix $\omega$ is equal to one, reflecting the total probability of emitting all labels from a given state.
\[\omega = \begin{bmatrix}\mathscr{l}(s_1)(l_1) & \cdots & \mathscr{l}(s_1)(l_{|\mathcal{L}|}) \\\vdots & \ddots & \vdots\\\mathscr{l}(s_{|S|})(l_1) & \cdots & \mathscr{l}(s_{|S|})(l_{|\mathcal{L}|})\end{bmatrix}\]
The transition function $\tau$ can be represented as a stochastic matrix $P$ where $P_{s, s'} = \tau(s)(s')$.
The matrix $P$ has the size $|S| \times |S|$.
The sum of each row in $P$ is equal to one, reflecting the total probability of transitioning from a given state to all other states.
\[ P = \begin{bmatrix}\tau(s_1)(s_1) & \cdots & \tau(s_1)(s_{|S|}) \\\vdots & \ddots & \vdots\\\tau(s_{|S|})(s_1) & \cdots & \tau(s_{|S|})(s_{|S|})\end{bmatrix}\]
The initial distribution $\pi$ can be represented as a vector $\pi$ where $\pi_s = \pi(s)$.
The vector $\pi$ has the size $|S|$.
The sum of all elements in $\pi$ is equal to one, reflecting the fact that $\pi\in D(s)$.
\[\pi = \begin{bmatrix}\pi(s_1) \\\vdots\\\pi(s_{|S|})\end{bmatrix}\]\subsection{Observations and Hidden States}\label{subsec:observations-hidden-states}
%% TODO: Add description of paths, observations, sequences of labels, and Giovannis comment on The Baum Welch Algorithm.
An \gls{hmm} operates on a multiset of observations, denoted as $\mathcal{O} = {O_1, O_2, \ldots, O_N}$, where each $O_i$ is a observation sequence of labels $\mathbf{O} = o_1, o_2, \ldots, o_{|\mathbf{O}|-1}$.
%The task is to infer the sequence of hidden states $S = s_1, s_2, \ldots, s_{|\mathbf{O}|-1}$ that most likely generated these observations.%This involves maximizing the probability of the hidden states $S$ conditioned on the observed sequence $O$.%In essence, the observations are the labels for which we seek the corresponding hidden state sequence, representing the most probable path that generated these observations.
This problem is commonly approximated using the Baum-Welch algorithm, a widely used method for estimating the probabilities of an \gls{hmm} and finding the most likely hidden state sequence.
%Baum-Welch for HMM mathamatically\subsection{The Baum-Welch Algorithm}\label{subsec:baum-welch}
The Baum-Welch algorithm is a fundamental method for estimating the parameters of a \gls{hmm} given a sequence of observations.
These parameters include the emission matrix $\omega$, the transition matrix $P$, and the initial state distribution $\pi$.
The algorithm is widely recognized as the standard approach for training \glspl{hmm} and was chosen for this project due to its ability to estimate these parameters without prior knowledge of the hidden states that generated the observations.
The Baum-Welch algorithm applies the Expectation-Maximization (EM) framework to iteratively improve the likelihood of the observed data under the current model parameters.
It consists of the following steps:
\begin{enumerate}
\item\textbf{Initialization:} Begin with initial estimates for the \gls{hmm} parameters $(\pi, P, \omega)$.
\item\textbf{Expectation Step (E-step):}
Compute the forward probabilities $\alpha_s(t)$ and backward probabilities $\beta_s(t)$, which represent:
\begin{itemize}
\item The probability of observing the sequence up to time $t$, given that the \gls{hmm} is in state $s$ at time $t$ ($\alpha_s(t)$).
\item The probability of observing the sequence from time $t+1$ to the end, given that the \gls{hmm} is in state $s$ at time $t$ ($\beta_s(t)$).
\end{itemize}
\item\textbf{Maximization Step (M-step):} Update the \gls{hmm} parameters $(\pi, P, \omega)$ to maximize the likelihood of the observed data based on the expected counts computed in the E-step.
\item\textbf{Iteration:} Repeat the E-step and M-step until convergence, i.e., when the change in likelihood between iterations falls below a predefined threshold.
\end{enumerate}
The Baum-Welch algorithm refines the \gls{hmm} parameters by iteratively maximizing the likelihood of the observations.
Starting with an initial hypothesis $\textbf{x}_0 = (\pi, P, \omega)$, the algorithm produces a sequence of parameter estimates $\textbf{x}_1, \textbf{x}_2, \ldots$, where each new estimate improves upon the previous one.
The process terminates when the improvement in likelihood is sufficiently small, satisfying the convergence criterion:
\[ ||l(o,x_n)|| < \epsilon\]
Where $l(o, x_n)$ is the likelihood of the observations under the parameters $x_n$, and $\epsilon > 0$ is a small threshold.
%where $l(\textbf{x}_n)$ is the likelihood of the observations under the parameters $\textbf{x}_n$, and $\epsilon > 0$ is a small threshold.
The steps of the Baum-Welch algorithm are detailed in the following sections, including the initialization of the \gls{hmm} parameters, the forward-backward algorithm, and the update algorithm, see \autoref{subsec:initialization},~\ref{subsec:forward-backwards_algorithm}, and~\ref{subsec:update-algorithm} respectively.
\subsection{Initialization of HMM Parameters}\label{subsec:initialization}
Before starting the Baum-Welch algorithm, we need to initialize the model parameters: the emission matrix $\omega$, the transition matrix $P$, and the initial state distribution $\pi$.
Since the algorithm is designed to converge iteratively toward a locally optimal parameter set, the initial estimates do not need to be exact.
However, reasonable initialization can accelerate convergence and improve numerical stability~\cite{benyacoub2015initial}.
As we are working with \glspl{hmm} we do not know which labels are emitted by which states, and we do not know the transition probabilities between states,therefore we cannot initialize the parameters based on some prior knowledge.
A common approach to initialize these parameters is as follows:
\subsubsection{Initialization state probability distribution}
The initial state probabilities $\pi$ represent the likelihood of starting in each hidden state $s \in S$.
To estimate initial state probabilities, we can use one of the following strategies:
\begin{itemize}
% \item Examine the observed data: Estimate the initial probabilities based on the observed data. For example, if certain states are more likely to occur at the beginning of the sequence, assign higher probabilities to these states. % $\pi_s = \frac{R_s}{\sum_{s' \in S} R_{s'}}$ where $R_s$ is the number of times state $s$ occurs at the beginning of the sequence.
The text was updated successfully, but these errors were encountered:
P9-BW-ADD/report/src/sections/02-preliminaries.tex
Line 77 in 63f20d3
The text was updated successfully, but these errors were encountered: