-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmarkov_networks.tex
121 lines (79 loc) · 5.66 KB
/
markov_networks.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
\section{Undirected Graphical Models}\label{sec:undirected_graphical_models}
\subsection{Factors}\label{sec:factors}
The characteristics of factors are defined as follows:
\begin{itemize}
\item Factors can not be directed in an undirected graphical model.
\item Factors subsumes both \emph{joint distribution} and \emph{conditional probability distribution}.
\item A joint distribution over $D$ is a factor over $D$. It specifies a real number for every assignment of values of $D$.
\item A conditional distribution $P(X|U)$ is a factor over $\{X\} \bigcup U$.
\item There are no constraints on a parameters in a factor.\marginnote{for example, in a joint distribution, the numbers must add up to $1$.}
\item A factor represents \emph{compatibility}/\emph{affinity} between the joining nodes.
\item A factor is only one contribution to the overall joint distribution.
\end{itemize}
\paragraph{Example: } Consider a fully connected graph over $\mathcal{X}$\marginnote{In a fully connected graph, there are no conditional independence.}. Let all the variables be binary. The we have:
\begin{itemize}
\item Each factor over an edge will have four parameters.
\item total number of parameters in the graph would be $4 \binom{n}{2}$.
\end{itemize}
\paragraph{Note} The number of parameters required to specify a joint distribution over $n$ binary variables is $2^n-1$. The number of parameters available in an undirected graph is $4 \binom{n}{2}$. \[4 \binom{n}{2} <<< 2^n-1\]Therefore, pairwise factors do not have enough parameters to completely cover the joint distribution space. A more general representation can be obtained by allowing factors over an arbitrary subsets of variables.
\subsection {Factor Product}
Let $X$, $Y$, and $Z$ be three disjoint variable sets and let $\phi_1(X,Y)$ and $\phi_2(Y,Z)$ be two factors. Then we define \emph{factor product} $\psi = \phi_1 \times \phi_2$, where $\psi:Val(X,Y,Z) \mapsto \mathbb{R}$. $\psi(X,Y,Z) = \phi_1(X,Y) \cdot \phi_2(Y,Z)$. Note that the two factors are multiplied in a way that matches up the common part $Y$.
\paragraph{Example} Let $\phi_1(A,B)$ and $\phi_2(B,C)$ be defined as follows:
\[
\begin{bmatrix}
a^1 & b^1 & 0.5 \\
a^1 & b^2 & 0.8 \\
a^2 & b^1 & 0.1 \\
a^2 & b^2 & 0 \\
a^3 & b^1 & 0.3 \\
a^3 & b^2 & 0.9
\end{bmatrix}
\cdot
\begin{bmatrix}
b^1 & c^1 & 0.5\\
b^1 & c^2 & 0.7\\
b^2 & c^1 & 0.1\\
b^2 & c^2 & 0.2
\end{bmatrix}
=
\begin{bmatrix}
a^1 & b^1 & c^1 & 0.5\cdot0.5=0.25 \\
a^1 & b^1 & c^2 & 0.5\cdot0.7=0.35 \\
a^1 & b^2 & c^1 & 0.8\cdot0.1=0.08\\
a^1 & b^2 & c^2 & 0.8\cdot0.2=0.16\\
a^2 & b^1 & c^1 & 0.05\\
a^2 & b^1 & c^2 & 0.07\\
a^2 & b^2 & c^1 & 0\\
a^2 & b^2 & c^2 & 0\\
a^3 & b^1 & c^1 & 0.15\\
a^3 & b^1 & c^2 & 0.21\\
a^3 & b^2 & c^1 & 0.09\\
a^3 & b^2 & c^2 & 0.18\\
\end{bmatrix}
\]
\subsection{Gibbs Distribution}
General notion of \emph{factors product} to define an undirected parametrization of a distribution.
\paragraph{Definition} A distribution $P_\phi$ is a Gibbs distribution, parametrized by a set of factors $\phi = \{ \phi_1(D_1),\ldots,\phi_k(D_k)\}$, if it is defined as follows:
\[P_\phi(X_1,\ldots,X_n) = \frac{1}{Z} \tilde{P}(X_1,\ldots,X_n)\] where
\[\tilde{P}(X_1,\ldots,X_n) = \phi_1(D_1) \times \phi_2(D_2) \times \ldots \times \phi_k(D_k) \] and
\[ Z = \sum_{X_1,\ldots, X_k} \tilde{P}(X_1,\ldots,X_n) = \text{partition function}\]
\newthought{To map Gibbs distribution to a graph} we inspect the scope of the factors contained in the parametrization. For example if the scope contain both $X$ and $Y$, then we will introduce an edge between $X$ and $Y$ nodes.
\paragraph {Markov network factorization} We say that a distribution $P_\phi$ with $\phi = \{\phi_1(D_1),\ldots,\phi_k(D_K)\}$ \emph{factorizes} over a \emph{markov network} $\mathcal{H}$ if each $D_k (k=1,\ldots, K)$ is a complete sub-graph of $\mathcal{H}$.
\paragraph {Clique potentials} The factors that parametrize a Markov network are often called \emph{clique potentials}. Note that
\begin{itemize}
\item Every complete sub-graph is a subset of some (maximal) clique. Therefore, we can reduce the number of factors in our parametrization by allowing factors only for maximal cliques. Let $C_1,\ldots,C_k$ be the cliques in $\mathcal{H}$, then we can parametrize $P$ using a set of factors $\phi_1(C_1),\ldots,\phi_k(C_l)$.
\item Any factorization (in terms of complete sub-graph) can be converted into this form simply by assigning each factor to a clique that encompasses its scope.
\item \emph{Clique potential} is then calculated by multiplying all the factors assigned to each clique.
\end{itemize}
\subsection{Pairwise Markov Networks}
\begin{marginfigure}
\markovPairwise
\caption{A pairwise Markov network (MRF) structured as a grid.}
\end{marginfigure}
A pairwise Markov network over a graph $\mathcal{H}$ is associated with a set of node potentials $\{\phi(X_i): i=1,\ldots,n\}$ and a set of edge potentials $\{\phi(X_i, X_j): (X_i, X_j) \in \mathcal{H}\}$. The overall distribution is the normalized product of all of the potentials (both node and edge).
\marginnote{A distribution where all of the factors are over single variables or pair of variables.}
\subsection{Reduced Markov Networks}
Conditioning a distribution on some assignment $u$ to some subset of variables$U$.
\marginnote{Conditioning a distribution corresponds to eliminating a ll entries in the joint distribution that are inconsistent with the event $U = u$, and renormalizing the remaining entries to sum to one.}
Let $\phi(Y)$ be a factor, and $U = u$ an assignment for $U \subseteq Y$. We define the reduction of the factor $\phi$ to the context $U=u$, denoted by $\phi[U=u] = \phi[u]$, to be a factor over scope $Y' = Y - U$ such that:
\[ \phi[u](y', u) = \phi(y', u)\]