38_gib.qmd

# Gibbs Sampling

## Computational Bayes

If we move away from conjugate priors, we are likely to have no
closed-form solution for the posterior distribution. But we can
approximate the posterior distribution by some computational algorithms.
The class of these algorithms are famously known as **Markov chain Monte
Carlo (MCMC)** methods. That is, one can construct a Markov chain that
has the desired distribution as its equilibrium distribution.

One strategy to approximate the shape of a distribution is through
**random sampling**. If we can draw random samples from a distribution,
as the sample size grows, sampling frequency will approach the
probability density. Properties of a distribution can also be estimated
from finite samples, e.g.
$\frac{1}{n}\sum_{i=1}^{n} x_i\to\mathbb{E}(x)$.

The problem is how to draw samples from a distribution without knowing
its PDF. It seems to be an impossible task. But what if the conditional
PDF is easier to compute?

## Gibbs Sampler for Linear Regression Models

Consider the example in @sec-blm with unknown variance. The joint
distribution of $p(\beta,\sigma^2 | Y)$ is hard to come by. But the
condition distribution is easier to compute:

$$
\begin{aligned}
p(\beta|\boldsymbol Y, \sigma^2) &\sim N(\hat\beta, D_\beta) \\
p(\sigma^2 | \boldsymbol Y,\beta) &\sim IG(\bar\nu, \bar S)
\end{aligned}
$$

Suppose we know how to draw random samples from the normal distribution
and the inverse-Gamma distribution (in fact, if the CDF is known, a
random sample can be generated by $y=\text{CDF}^{-1}(x)$, where
$x \sim U(0,1)$), we can generate samples $\{(\beta,\sigma^2)\}$ by the
following algorithm.

::: {.callout-note icon="false"}
## Algorithm (Gibbs Sampling)

Pick some initial value $\beta^{(0)}=a_0$ and $\sigma^{2(0)} = b_0$.
Repeat the following steps for $i=1...N$:

1.  Draw $\sigma^{2(i)}\sim p(\sigma^2 | Y, \beta^{(i-1)})$
    (Inverse-Gamma);

2.  Draw $\beta^{(i)}\sim p(\beta| Y, \sigma^{2(i)})$ (Normal).
:::

The most efficient sampling is direct independent sampling
$(\beta,\sigma^2)$ from the posterior distribution. If this is not
feasible, Gibbs sampler is a reasonable alternative. It can be imagined,
in the stationary situation, sampling from the conditional distributions
would be statistically identical to sampling from the joint probability
distribution. However, Gibbs sampling can be highly inefficient if the
posterior variables are highly correlated.

## Gibbs Sampler for General Models

Here is the more general Gibbs algorithm. Suppose we have a model with
$k$ parameters $\boldsymbol\theta=(\theta_1,\theta_2,\dots,\theta_k)$.
Assume we can derive the exact expressions for each of the conditional
distributions:
$p(\theta_i|\theta_1,\dots,\theta_{i-1},\theta_{i+1},\dots,\theta_k, Y)$.
Further assume we can generate independent samples from each of them.
Then the Gibbs sampler runs as follows.

::: {.callout-note icon="false"}
## Algorithm (Gibbs Sampling)

Initialize the algorithm with
$(\theta_1^{(0)},\theta_2^{(0)},\dots,\theta_k^{(0)})$. Then repeat the
following steps:

1.  Choose a random parameter ordering, e.g.
    $(\theta_3, \theta_1,\theta_2, …)$. Denote the parameters with the
    new ordering $(\theta_1,\theta_2,\theta_3,…)$.

2.  Sample from the conditional distribution for each parameter using
    the most up-to-date parameters. That is, draw samples from the
    following distributions subsequently:

    $$
    \begin{aligned}
    & p(\theta_1^{(i)} | \theta_2^{(i-1)}, \theta_3^{(i-1)},\dots,\theta_k^{(i-1)}, Y) \\
    & p(\theta_2^{(i)} | \theta_1^{(i)}, \theta_3^{(i-1)},\dots,\theta_k^{(i-1)}, Y) \\
    & p(\theta_3^{(i)} | \theta_1^{(i)}, \theta_2^{(i)},\dots,\theta_k^{(i-1)}, Y) \\
    & \vdots \\
    & p(\theta_k^{(i)} | \theta_1^{(i)}, \theta_2^{(i)},\dots,\theta_{k-1}^{(i)}, Y)
    \end{aligned}
    $$

Repeat the process until the algorithm is reasonably converged.
:::