fun_with_GMMs.html

<!DOCTYPE html>
<html lang="en">


  <link href='https://fonts.googleapis.com/css?family=Raleway' rel='stylesheet'>


  <head>
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <meta name="viewport" content="width=device-width, initial-scale=1">

  <title>Fun with GMMs</title>
  <meta name="description" content="">
  <link href="https://fonts.googleapis.com/css?family=EB+Garamond" rel="stylesheet">
  <link href="https://fonts.googleapis.com/css?family=Alegreya" rel="stylesheet">
  <link href="https://fonts.googleapis.com/css?family=Libre+Baskerville" rel="stylesheet">
  <link href="https://fonts.googleapis.com/css?family=Raleway:300" rel="stylesheet" type='text/css'> 
  <link href="https://fonts.googleapis.com/css?family=PT+Serif|Work+Sans:300" rel="stylesheet">
  <link rel="preconnect" href="https://fonts.googleapis.com">
  <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
  <link href="https://fonts.googleapis.com/css2?family=EB+Garamond:ital,wght@0,400;0,500;0,800;1,800&family=Raleway:wght@100&display=swap" rel="stylesheet"> 
  <link href="https://fonts.googleapis.com/css2?family=Libre+Baskerville&display=swap" rel="stylesheet"> 
  <link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Zilla+Slab:wght@400&display=swap" rel="stylesheet"> 
  <link rel="stylesheet" href="//maxcdn.bootstrapcdn.com/font-awesome/4.3.0/css/font-awesome.min.css">
  <link rel="stylesheet" href="/assets/main.css">
  <link rel="canonical" href="https://blog.quipu-strands.com/fun_with_GMMs">
  <link rel="alternate" type="application/rss+xml" title="A Not-So Primordial Soup" href="/feed.xml">
  
  
<script data-goatcounter="https://abhgh.goatcounter.com/count"
        async src="//gc.zgo.at/count.js"></script>
</head>


  <body>

    <header class="site-header" role="banner">

  <div class="wrapper">
    
    
    <a class="site-title" href="/">A Not-So Primordial Soup</a>
  
    
      <nav class="site-nav">
        <input type="checkbox" id="nav-trigger" class="nav-trigger" />
        <label for="nav-trigger">
          <span class="menu-icon">
            <svg viewBox="0 0 18 15" width="18px" height="15px">
              <path fill="#424242" d="M18,1.484c0,0.82-0.665,1.484-1.484,1.484H1.484C0.665,2.969,0,2.304,0,1.484l0,0C0,0.665,0.665,0,1.484,0 h15.031C17.335,0,18,0.665,18,1.484L18,1.484z"/>
              <path fill="#424242" d="M18,7.516C18,8.335,17.335,9,16.516,9H1.484C0.665,9,0,8.335,0,7.516l0,0c0-0.82,0.665-1.484,1.484-1.484 h15.031C17.335,6.031,18,6.696,18,7.516L18,7.516z"/>
              <path fill="#424242" d="M18,13.516C18,14.335,17.335,15,16.516,15H1.484C0.665,15,0,14.335,0,13.516l0,0 c0-0.82,0.665-1.484,1.484-1.484h15.031C17.335,12.031,18,12.696,18,13.516L18,13.516z"/>
            </svg>
          </span>
        </label>

        <div class="trigger">
          
            
            <a class="page-link" href="/about/">About</a>
            
          
            <a class="page-link" href="/Terms/">Terms</a>
            
          
        </div>
      </nav>
    
  </div>
</header>


    <main class="page-content" aria-label="Content">
      <div class="wrapper">
        <article class="post" itemscope itemtype="http://schema.org/BlogPosting">

  <header class="post-header">
    <h1 class="post-title" itemprop="name headline">Fun with GMMs</h1>
    <p class="post-meta">
      <time datetime="2022-04-22T20:00:00-07:00" itemprop="datePublished">
        
        Created: Apr 22, 2022.    Last major update:
        Nov 16, 2023. 
      </time>
      </p>
  
     
  </header>

  <div class="post-content" itemprop="articleBody">
    <script type="text/x-mathjax-config"> 
    MathJax.Hub.Config({ 
        "HTML-CSS": { scale: 100, linebreaks: { automatic: true } }, 
        SVG: { linebreaks: { automatic:true } }, 
        displayAlign: "center" });
</script>

<script type="text/javascript" async="" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">

</script>

<p><em>Generative Models</em> have been all the rage in AI lately, be it image generators like <a href="https://stability.ai/blog/stable-diffusion-public-release">Stable Diffusion</a> or text generators like <a href="https://openai.com/blog/chatgpt/">ChatGPT</a>. These are examples of fairly sophisticated generative systems. But whittled down to basics, they are a means to:</p>

<ul>
  <li>(a) concisely represent patterns in data, in a way that …</li>
  <li>(b) they can <em>generate</em> later what they have “seen”.</li>
</ul>

<p>A bit like an artist who witnesses a scenery and later recreates it on canvas using her memory; her memory acting as a generative model here.</p>

<p>In this post, I will try to illustrate this mechanism using a specific generative model: the <strong>Gaussian Mixture Model (GMM)</strong>. We will use it to capture patterns <em>in images</em>. Pixels will be our data, and patterns are how they are “lumped” together. Of course, this lumping is what humans perceive as the image itself. Effectively then, much like our artist, we will use a generative model to “see” an image and then have it reproduce it later. <strong>Think of this as a rudimentary, mostly visual, tutorial on GMMs</strong>, where we focus on their representational capability. Or an article where I mostly ramble but touch upon GMMs, use of probabilities, all the while creating fancy art like the ones below!</p>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/waves_examples.png" alt="test" />
    
    
        <p class="image-caption">Renderings obtained using a GMM.</p>
    
</div>

<p>My reason for picking GMMs is that they are an intuitive gateway to the world of generative modeling. They are also well-studied: the first paper that talks about GMMs was <a href="https://royalsocietypublishing.org/doi/10.1098/rsta.1894.0003">published in 1894</a>! GMMs also make some of the math convenient; we’ll see examples of this soon. Library-wise <a href="https://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html#sklearn.mixture.GaussianMixture">scikit</a> has a nice implementation of GMMs, which I have used for this post. <a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html">scipy</a> has convenient functions to work with the Gaussian distribution - which is also used here.</p>

<p>I know I said this is going to be “mostly” visual. I won’t be abnegating <em>all</em> math here : we will discuss <em>some</em> necessary math, that is just about enough to understand how to model images. Here’s the layout of the rest of this post.</p>

<ul id="markdown-toc">
  <li><a href="#crash-course-in-g-and-mm" id="markdown-toc-crash-course-in-g-and-mm">Crash Course in G and MM</a>    <ul>
      <li><a href="#what-are-gmms" id="markdown-toc-what-are-gmms">What are GMMs?</a></li>
      <li><a href="#sampling-from-a-gmm" id="markdown-toc-sampling-from-a-gmm">Sampling from a GMM</a></li>
      <li><a href="#multivariate-gaussians" id="markdown-toc-multivariate-gaussians">Multivariate Gaussians</a></li>
      <li><a href="#contour-plots" id="markdown-toc-contour-plots">Contour Plots</a></li>
      <li><a href="#some-properties" id="markdown-toc-some-properties">Some Properties</a></li>
    </ul>
  </li>
  <li><a href="#actual_content" id="markdown-toc-actual_content">Why Images?</a></li>
  <li><a href="#black-and-white-images" id="markdown-toc-black-and-white-images">Black and White Images</a>    <ul>
      <li><a href="#sampling" id="markdown-toc-sampling">Sampling</a></li>
      <li><a href="#grid-plotting" id="markdown-toc-grid-plotting">Grid-plotting</a></li>
      <li><a href="#custom-mapping-for-grid-plots" id="markdown-toc-custom-mapping-for-grid-plots">Custom Mapping for Grid Plots</a></li>
    </ul>
  </li>
  <li><a href="#color-images" id="markdown-toc-color-images">Color Images</a>    <ul>
      <li><a href="#sampling-1" id="markdown-toc-sampling-1">Sampling</a></li>
      <li><a href="#grid-plotting-1" id="markdown-toc-grid-plotting-1">Grid-plotting</a></li>
    </ul>
  </li>
  <li><a href="#notes" id="markdown-toc-notes">Notes</a></li>
</ul>

<h2 id="crash-course-in-g-and-mm">Crash Course in G and MM</h2>
<p>Some basics first, so that we can comfortably wield GMMs as tools. If you are familiar with the <a href="https://en.wikipedia.org/wiki/Normal_distribution">Gaussian distribution</a> (also known as the <em>Normal</em> distribution), GMMs, conditional and marginal probabilities, <strong>feel free to skip ahead to the <a href="#actual_content">next section</a></strong>.</p>

<h3 id="what-are-gmms">What are GMMs?</h3>

<p>Think of GMMs as a way to approximate functions of special kind: <em>probability density functions (pdf)</em>. <em>pdf</em>s are a way to express probabilities over variables that take continuous values. For ex., you might want to use a <em>pdf</em> to describe the percentage \(x\) of humidity in the air tomorrow, where \(x\) can be any real number between 0 and 100, i.e., \(x \in [0, 100]\). The <em>pdf</em> value at \(x\), denoted by \(p(x)\) (often called the “density”), in some loose sense represents how likely the value is, but is <strong>not</strong> a probability value in itself. However, if you sum these values for a contiguous range or an “interval” of \(x\)s, you end up with a valid probability value. In the case of continuous values, such sums are <em>integrals</em>; so, the probability of humidity being within 40%-60% is \(\int_{40}^{60} p(x)dx\). Some well known <em>pdf</em>s are <a href="https://en.wikipedia.org/wiki/Normal_distribution">Gaussian</a> (the star of our show, also known as the <em>Normal</em> distribution), <a href="https://en.wikipedia.org/wiki/Poisson_distribution">Poisson</a> and <a href="https://en.wikipedia.org/wiki/Beta_distribution">Beta</a>.</p>

<p>Let’s talk about the Gaussian for a bit. For a variable \(x\), the density \(p(x)\) can be calculated using this formula:</p>

\[p(x)=\frac{1}{\sigma \sqrt{2\pi}}e^{-\frac{1}{2}(\frac{x-\mu}{\sigma})^2}\]

<p>We won’t be using this formula. But note that it depends on the following two parameters:</p>
<ul>
  <li>\(\mu\): the <em>mean</em> or average value of the distribution.</li>
  <li>\(\sigma\): the <em>standard deviation</em>, e.g., how spread out is the distribution. Often we talk in terms of the square of this quantity, the <em>variance</em> \(\sigma^2\).</li>
</ul>

<p>Some common shorthands:</p>
<ul>
  <li>A Gaussian distribution with parameters \(\mu\) and \(\sigma\) is \(\mathcal{N}(\mu, \sigma^2)\).</li>
  <li>The <strong>standard Normal</strong> is one with \(\mu=0\) and \(\sigma=1\), and is denoted by \(\mathcal{N}(0, 1)\).</li>
  <li>Instead of saying \(p(x)\) is given by \(\mathcal{N}(\mu, \sigma^2)\), we may concisely write \(\mathcal{N}(x;\mu, \sigma^2)\) for the density of \(x\).</li>
  <li>To say \(x\) was <em>sampled</em> from such a distribution, we write \(x \sim \mathcal{N}(\mu, \sigma^2)\). This is also equivalent to saying that the density of such \(x\)s are given by \(\mathcal{N}(x;\mu, \sigma^2)\) (the expression in the previous point). What is used depends on the context: do we want to highlight sampling or density?</li>
</ul>

<p>Now, consider a practical scenario: you have some data, that doesn’t look anything like one of  the commonly known <em>pdfs</em>. How would you mathematically express its density? In a time-honoured tradition, we will use something that we know to <em>approximate</em> what we don’t know. We’ll use a bunch or a <strong>mixture</strong> of superposed Gaussians (hence the name). To specify such a mixture, we need:</p>
<ul>
  <li><strong>Number</strong> of components, \(K\).</li>
  <li><strong>Parameters</strong> or <em>shape</em> of the components. We will denote component \(i\) by \(\mathcal{N}(\mu_i, \sigma_i^2)\), where we need to figure out the values for \(\mu_i\) and  \(\sigma_i\).</li>
  <li><strong>Weights</strong> \(w_i\) of the components, or how much a component contributes to the overall density. We require two properties here (we’ll see why soon): (a) \(0 \leq w_i \leq 1\), and (b) \(\sum_i^K w_i = 1\).</li>
</ul>

<p>The resultant density \(p(x)\) of a point due to this mixture is given by the sum of individual Gaussian densities multiplied by their weights:</p>

\[\begin{aligned}
p(x) = \sum_{i=1}^{K} w_i \mathcal{N}(x;\mu_i, \sigma_i^2) 
\end{aligned}\]

<p>The physical interpretation is that \(x\) may have been generated by the \(i^{th}\) component with probability \(w_i\) (which explains the properties needed of \(w_i\) - they’re effectively probabilities), and then for this component, its density is given by \(\mathcal{N}(x;\mu_i, \sigma_i^2)\). The net <em>pdf</em> value for a specific value for \(x\) due to component \(i\) then is \(w_i \mathcal{N}(x;\mu_i, \sigma_i^2)\). Since \(x\) may be generated by any component, the overall \(p(x)\) is given by the sum above.</p>

<!---
If you were to plot a *pdf*, with the y-axis showing the value $$p(x)$$, you might get something like the blue curve (labelled "original") in the figure below - note that here $$x \in [0, 50]$$. 

Without knowing anything else about $$p(x)$$, what can we say about its properties? Well, the probability of *any* event happening should be `1`, so here, $$\int_0^{50} p(x) = 1$$. This is generally true - the integral over the entire space of $$x$$ is `1`. And you can't also have negative values for $$p(x)$$ - because if you did, you can narrowly define a contiguous space around such an $$x$$ such that the integral computes to a negative number, and that doesn't make sense since this is a probability!
-->

<p>This turns out to be a powerful device, since with the right number of components, shapes and weights, one may approximate arbitrary distributions. The plot below shows different <code class="language-plaintext highlighter-rouge">2</code>-component GMMs, constructed out of the same two components (red dashed lines), but with different weights (mentioned in the legend). You can see that this alone produces various different <em>pdf</em>s (solid lines).</p>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/demo1D_mixture_examples_annot.png" alt="test" />
    
    
        <p class="image-caption">Different pdfs using a GMM</p>
    
</div>

<p>Let’s go back to the original question. In the figure below, we consider some data points on the x-axis (shown with small vertical lines, known as a <a href="https://seaborn.pydata.org/generated/seaborn.rugplot.html">rugplot</a>). The corresponding histogram is also shown. We try to approximate this <em>target</em> distribution with a GMM with <code class="language-plaintext highlighter-rouge">3</code> components, shown with the red dashed lines. The black solid line shows the final <em>pdf</em> obtained using appropriate weights (not shown), which you’d note, is quite close to the histogram. This is a GMM in action.</p>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/gmm_approx_demo_annot.png" alt="test" />
    
    
        <p class="image-caption">Approximating a data dsitribution with a GMM</p>
    
</div>

<p>Here we won’t discuss how we find the number of components, or their parameters, or their weights - the first one is a hyperparameter (so you either determine it based on some task specific metric or use a heuristic such as <a href="https://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html#sklearn.mixture.GaussianMixture.aic">AIC</a>), and the latter are typically determined using a popular algorithm known as <a href="https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm">Expectation Maximization (EM)</a>, for which we use <a href="https://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html#sklearn.mixture.GaussianMixture.fit">scikit’s implementation</a>. Here’s an interesting bit of trivia: the EM algorithm was <a href="https://www.jstor.org/stable/2984875">proposed in 1977</a>, which was much after the first use of GMMs (which, as mentioned above, was in 1894)!</p>

<!--Of course, these approximations aren't always perfect. But we try to reduce the gap between the target density and a GMM as much as we can (typically using [Expectation Maximization](https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm#Gaussian_mixture)).-->
<p>Interestingly, it is also possible to use an <a href="https://groups.seas.harvard.edu/courses/cs281/papers/rasmussen-1999a.pdf"><em>infinite</em></a> number of components, but that’s again something we won’t cover here.</p>

<h3 id="sampling-from-a-gmm">Sampling from a GMM</h3>

<p>As a quick stopover, let’s ask ourselves the “inverse” question: how do we generate data from a GMM \(\sum_{i=1}^{K} \mathcal{N}(x;\mu_i, \sigma_i^2)\)? Think about the physical interpretation we discussed earlier; that’s going to be helpful now. To generate one data instance \(x\), we follow this two-step process:</p>
<ul>
  <li>Use \(w_i\) as probabilities to pick a particular component \(i\).</li>
  <li>And then generate \(x \sim \mathcal{N}(\mu_i, \sigma_i^2)\).</li>
</ul>

<p>Repeat the above steps till you have the required number of samples.</p>

<p>This ends up producing instances from the components in proportion to their weights. The following video shows this for a <code class="language-plaintext highlighter-rouge">2</code>-component GMM. For sampling a point, first a component is chosen - this choice is temporarily highlighted in red. And then an instance is sampled - shown in blue. The black line shows the <em>pdf</em> of the GMM, but if you sampled enough instances and sketched its empirical distribution (using, say, a <a href="https://seaborn.pydata.org/generated/seaborn.kdeplot.html">kdeplot</a>), you’d get something very close to this <em>pdf</em>.</p>

<video id="movie" preload="" controls="" width="80%" height="80%">
   <source id="srcMp4" src="/assets/fun_with_GMMs/sampling_animation.mp4#t=0.2" />
</video>

<h3 id="multivariate-gaussians">Multivariate Gaussians</h3>

<p>Of course, GMMs can be extended to an arbitrary number of \(d\) dimensions, i.e., when \(x \in \mathbb{R}^d\). We augment the notation to denote such a <em>pdf</em> as \(p(x_1, x_2, .., x_d)\), where \(x_i\) denotes the \(i^{th}\) dimension of a data instance \(x\). The GMM now is composed of <em>multivariate</em> Gaussians \(\mathcal{N}(\mu, \Sigma)\). The symbol for the mean is still \(\mu\), but now \(\mu \in \mathbb{R}^d\). Instead of the variance, we have a <em>covariance matrix</em>, where \(\Sigma_{i,j}\) - the entry at the \((i,j)\) index - denotes the covariance between values of \(x_i\) and \(x_j\) across the dataset. Since covariance doesn’t depend on the order of variables, \(\Sigma_{i,j}=\Sigma_{j, i}\) and thus \(\Sigma \in \mathbb{R}^{d \times d}\) is a <em>symmetric</em> matrix.</p>

<p>And yes, unfortunately the symbol for <em>summation</em> \(\sum\), which is <code class="language-plaintext highlighter-rouge">\sum</code> in LaTeX, and the <em>covariance matrix</em> \(\Sigma\) - <code class="language-plaintext highlighter-rouge">\Sigma</code> - look very similar. I was tempted to use a different symbol for the latter, but then I realized it might just be better to bite the bullet and get used to the common notation.</p>

<p>The GMM operates just the same, i.e., for \(x \in \mathbb{R}^d\), we have:</p>

\[\begin{aligned}
p(x) = \sum_{i=1}^{K} w_i \mathcal{N}(x;\mu_i, \Sigma_i) 
\end{aligned}\]

<h3 id="contour-plots">Contour Plots</h3>

<p>We want to be able to visualize these high-dimensional GMMs.
In general, faithful visualization of high-dimensional structures is hard, but, fortunately, there is a well known technique applicable to two-dimensions: <em>contour plots</em>. In the figure below, (a) shows some data points in 2D. I have used a <code class="language-plaintext highlighter-rouge">2</code>-component two-dimensional GMM to model this data. Instead of showing the <em>pdf</em> on a \(z\)-axis in a 3D plot, we calculate the values \(z=p(x_1, x_2)\) values at a whole lot of points in the input space, and then <em>connect</em> the points with identical \(z\) values. We might also color the lines formed by these connections, to indicate how high or low \(z\) is. These lines are knowns as <em>contours</em> and (b) shows such a <em>contour</em> plot.</p>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/two_circles_annot.png" alt="test" />
    
    
        <p class="image-caption">(a) shows a 2-D dataset. (b) shows the contour plot of 2-component GMM fit to the data.</p>
    
</div>

<p>Note the legend in (b) - it has negative numbers. This is because (b) actually shows contours for \(log\; p(x_1, x_2)\) - which is a common practice (partly because it leads to better visualization, and partly because the quantity \(log\; p\) itself shows up in the math quite a bit).</p>

<h3 id="some-properties">Some Properties</h3>

<p>So far the only thing we have used the GMM for is to sample data. This is useful, but sometimes we want to use the model to answer specific questions about the data, e.g., what is the mean value of the data? We’ll briefly look at this aspect now.</p>

<p><strong>Expectation</strong></p>

<p>The <strong>expectation</strong> or the <strong>expected value</strong> of variable is its average value weighted by the probabilites of taking those values:</p>

\[\begin{aligned}
E_p[x] = \int_{-\infty}^{+\infty} x p(x) dx
\end{aligned}\]

<p>The subscript \(p\) in \(E_p[x]\) denotes the distribution wrt which we assume \(x\) varies; we’ll drop this since its often clear from context. While the integral ranges from \({-\infty}\) to \({+\infty}\), it just indicates that we should account for all possible values \(x\) can take.</p>

<p>For a Normal distribution, \(E[x]\) is just its mean \(\mu\). We can use this fact to conveniently compute the expectation for a GMM:</p>

\[\begin{aligned}
E[x] &amp;= \int_{-\infty}^{+\infty} x \Big(  \sum_{i=1}^{K} w_i \mathcal{N}(x;\mu_i, \Sigma_i) \Big) dx \\
    &amp;= \int_{-\infty}^{+\infty}  \Big(  \sum_{i=1}^{K} w_i x \mathcal{N}(x;\mu_i, \Sigma_i) \Big) dx \\
    &amp;=\sum_{i=1}^{K} \Big( \int_{-\infty}^{+\infty} w_i x \mathcal{N}(x;\mu_i, \Sigma_i) dx \Big) \\
    &amp;=\sum_{i=1}^{K} \Big( w_i \int_{-\infty}^{+\infty}  x \mathcal{N}(x;\mu_i, \Sigma_i) dx \Big) \\
    &amp;=\sum_{i=1}^{K} w_i \mu_i \\
\end{aligned}\]

<p><strong>Conditional Distributions</strong></p>

<p>It’s often useful to talk about the behavior of certain variables while holding other variables constant. For example, in a dataset comprised of <code class="language-plaintext highlighter-rouge">Age</code> and <code class="language-plaintext highlighter-rouge">Income</code> of people, you might want to know how <code class="language-plaintext highlighter-rouge">Income</code> varies for <code class="language-plaintext highlighter-rouge">25</code> year olds. You’re technically looking for a <em>conditional</em> distribution: the distribution of certain variables like <code class="language-plaintext highlighter-rouge">Income</code>, given a <em>condition</em> such as <code class="language-plaintext highlighter-rouge">Age=25</code> over the other variables.</p>

<p>The density obtained after conditioning on a certain subset of dimensions, e.g., \(x_{d-2}=a,x_{d-1}=b,x_{d}=c\), is written as (note the <code class="language-plaintext highlighter-rouge">|</code> character):</p>

\[\begin{aligned}
p(x_1, x_2,..., x_{d-3} \vert x_{d-2}=a,x_{d-1}=b,x_{d}=c)
\end{aligned}\]

<!--Often, the specific values $$a,b,c$$ are dropped to indicate we're interested in the *form* of the conditional distribution for *some* values assigned to the variables conditioned on - what these values are, hopefully, is made clear by context.-->

<p>What do such conditonal distrbutions look for the Gaussian? Consider a simple case of two variables \(x_1, x_2\), and conditioning on \(x_1\):</p>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/gmm_conditionals.gif" alt="test" />
    
    
        <p class="image-caption">Conditional distribution for a Gaussian.</p>
    
</div>

<p>Look at the various conditional distributions \(p(x_2 \vert x_1)\), for different \(x_1\). In the figure, this is equivalent to “slicing” through the Gaussian at specific values of \(x_1\), as shown by the three solid lines. Do these distributions/solid lines suspiciously look like Gaussians? They are! The conditional distributions of a Gaussian are also Gaussian. We wouldn’t go into proving this interesting property, but its important to know it is true.</p>

<p><br /></p>
<hr style="height:1px;border-width:0;background-color:#EE2967" />

<p><span style="color:#EE2967"><strong>NOTE</strong></span>: 
the above animation only serves to provide a good mental model and doesn’t present the whole picture. You can delve deeper into nuances this visualization misses <a href="https://stats.stackexchange.com/questions/385823/variance-of-conditional-multivariate-gaussian">here</a>, but if this is your first encounter with the notion, you can skip it for now.</p>
<hr style="height:1px;border-width:0;background-color:#EE2967" />
<p><br /></p>

<!--
**Marginal distributions**

These are distributions over a subset of variables, while averaging (or "marginalizing") over the values of other values. For example, $$p(x_2)$$ averaged over the variations in $$x_1$$. This *also* happens to be a Gaussian! 

The following image visualizes this - at a specific $$x_2$$, the distribution of the $$p(x_1\vert x_2)$$ is shown with a solid orange-ish line. A dashed horizontal line through it depicts the *average* *pdf* value of this $$p(x_1\vert x_2)$$ distribution. The height of this line denotes the average probability of this specific $$x_2$$ - where the "averaging" happens over the probability of various $$x_1$$ that this $$x_2$$ co-occurs with.

If I then draw a curve through the heights of these dashed lines - shown in black - we again see something approximating a Gaussian.

-->
<p>But what does this say about GMMs? Because of the above property, the conditional distribution for a GMM is also a GMM. This new GMM has different weights and components. You can find a derivation <a href="https://stats.stackexchange.com/questions/348941/general-conditional-distributions-for-multivariate-gaussian-mixtures">here</a>. This insight will be helpful when we deal with color images.</p>

<p>OK, our short stop is over … on to actual problem-solving!</p>

<h2 id="actual_content">Why Images?</h2>
<p>If GMMs are about modeling data, how do images fit in? Let’s take the example of black-and-white images: you can think of every single black pixel as a data point existing at its location. Look at the image below - the pixels that make up the shape of the dog are the data we will use to fit our GMM.</p>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/dog_projected_small.jpg" alt="test" />
    
    
        <p class="image-caption">GMMs. Original image source: pixabay.com</p>
    
</div>

<p>For color images things get a bit complicated - but not that much, and we’ll find a way to deal with it later.</p>

<h2 id="black-and-white-images">Black and White Images</h2>
<p>Let’s start with black-and-white images since they are easier to model. Our training data for the GMM is essentally the collection of all coordinates where we have black pixels, as shown below:</p>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/bnw_create_train.png" alt="test" />
    
    
        <p class="image-caption"></p>
    
</div>

<p>The white regions in the above image DO NOT contribute to the dataset. The learned GMM only knows where data is present. In this case the components are 2-dimensional Gaussians since we are only modeling coordinates in a 2D plane.</p>

<h3 id="sampling">Sampling</h3>
<p>We now generate our first images &lt;drum roll&gt;! The recipe is simple:</p>
<ul>
  <li>We create our training data as shown above, and fit a 2-dimensional GMM. We set the number of components \(K=500\). No particular reason, I just picked an arbitrary high number.</li>
  <li>Create a contour plot for the GMM - which gives us our first fancy image. Art!</li>
</ul>

<p>Going to back to our previous example, we have:</p>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/pixabay_dog_cartoon.jpg" alt="test" />
    
    
        <p class="image-caption">Contour plot.</p>
    
</div>

<p>We can also generate the image by <em>sampling</em> from the GMM. We have seen how to sample earlier - its pretty much the same now, except we have more components and each point we sample has <code class="language-plaintext highlighter-rouge">2</code> dimensions, and thus can be placed on a plane.</p>

<p>We’ll sample multiple points \((x_1, x_2)\), and at these locations, we place a blue pixel. We show what these samples look like for different sample sizes in (a) and (b), for ~20k and ~40k points respectively.</p>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/multiple_panels_pixabay_dog_cartoon.jpg" alt="test" />
    
    
        <p class="image-caption">Images generated by sampling.</p>
    
</div>

<p>OK, lets dial up the complexity a bit. We’ll use an image of a butterfly this time, that has more details. Plots (a) and (b) are the same as before, but we increase the sample size in (c) and (d) to ~50k and ~100k respectively to handle the increased image details.</p>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/pixabay_butterfly.jpg" alt="test" />
    
    
        <p class="image-caption">Contour plot of a very psychedelic butterfly. Original image source: pixabay.com.</p>
    
</div>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/multiple_panels_pixabay_butterfly.jpg" alt="test" />
    
    
        <p class="image-caption">Images generated by sampling.</p>
    
</div>

<p>Next, we will make the image even harder to learn - which effectively means the image has a lot of detail, which makes it computationally challenging for the model to learn. We’ll use an image of the famous print <a href="https://en.wikipedia.org/wiki/The_Great_Wave_off_Kanagawa">Great Wave off Kanagawa</a>. Here, you can see the outputs have lost quite a bit of detail.</p>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/waves_off_kangawa.jpg" alt="test" />
    
    
        <p class="image-caption">Contour plot. Original image source: wikipedia.com.</p>
    
</div>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/multiple_panels_waves_off_kangawa.jpg" alt="test" />
    
    
        <p class="image-caption">Images generated by sampling.</p>
    
</div>

<p>These look cool - but there is another way to use our GMMs to generate these images. Let’s look at it next.</p>

<h3 id="grid-plotting">Grid-plotting</h3>

<p>Instead of sampling from the GMM, we can go over a grid of coordinates \((x, y)\) in the input space, and color the pixel at the location based on the value for \(p(x, y)\). The below image shows the process. The blank circles show locations yet to be colored.</p>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/bnw_raster_scheme.png" alt="test" />
    
    
        <p class="image-caption">Color positions based on p(x,y) values.</p>
    
</div>

<p>The above image simplifies one detail: \(p(x, y)\) values are <em>pdf</em> values, so do not have an upper-bound (but our grayscale ranges within <code class="language-plaintext highlighter-rouge">[0,1]</code>). We collect all these values for our grid of points, scale them to lie in the range <code class="language-plaintext highlighter-rouge">[0,1]</code> before we plot our image.</p>

<p>Let’s start with the image of the dog. We see its almost perfectly recovered! We probably expect this by now given its a simple sketch.</p>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/probs_pixabay_dog_cartoon.jpg" alt="test" />
    
    
        <p class="image-caption">Grid plot.</p>
    
</div>

<p>However, notice the width of the lines. Do they seem thicker? That’s because pixels in the close neighborhood of the actual sketch lines also end up getting a black-ish color - afterall, we are using a probabilistic model. As details in an image grow, you will see more of this effect. Below, we have the butterfly image - you can actually see smudging near the borders, making it look like a charcoal sketch.</p>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/probs_pixabay_butterfly.jpg" alt="test" />
    
    
        <p class="image-caption">Grid plot.</p>
    
</div>

<p>The waves image is tricky because of the high amount of detail - and its not surprising that we obtain a heavily smudged image, to the point it looks like a blob!:</p>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/probs_waves_off_kangawa.jpg" alt="test" />
    
    
        <p class="image-caption">Grid plot.</p>
    
</div>

<p>But fear not - we have another trick up our collective sleeves!</p>

<h3 id="custom-mapping-for-grid-plots">Custom Mapping for Grid Plots</h3>

<p>How do we do better? A simple hack is to change the mapping of probabilities to color values, so that the numerous points with low probabilities are visibly <em>not</em> dark. We can replace the implicit linear mapping we’ve been using with a bespoke non-linear mapping that achieves this outcome - in the figure below, (a) shows this with a piecewise linear function, where the dashed orange line is the default linear mapping (shown for reference). The x-axis shows the scaled <em>pdf</em> values of pixels, and they y-axis shows the score they are mapped to on the gray colormap, which ranges between <code class="language-plaintext highlighter-rouge">[0, 255]</code>. Our version allows for very dark colors for only relatively high scaled <em>pdf</em> values.</p>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/non_linear_scaling_annot.png" alt="test" />
    
    
        <p class="image-caption">Custom mapping between probabilities and intensity of black color.</p>
    
</div>

<p>Below we compare the linear mapping in (a) to our custom mapping in (b). In (b) we do end up recovering some details; you can actually see Mt. Fuji now! We can also explore playing with the number of components, point sizes, alpha values etc.</p>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/interpolations_compared_annot.png" alt="test" />
    
    
        <p class="image-caption">With custom mapping.</p>
    
</div>

<p>There are probably numerous ways to tweak the steps in this section to make the images more aesthetic (and it can get quite addictive!), but lets stop for now and instead direct our attention to modeling color images.</p>

<h2 id="color-images">Color Images</h2>
<p>Finally, the pièce de résistance!</p>

<p>The challenge with color images is we can’t just fit a model to the locations of certain pixels: ALL locations are important, because they all have colors. Additionally, we also need to model the actual colors - in the previous case, this is not something we explicitly modeled.</p>

<p>Let’s proceed by breaking down the problem. We can think of every location in the image as having 3 colors: red (R), green (G), blue (B). These are also known as <em>channels</em>. In the image below, the RGB channels for a color image are shown.</p>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/color/channels_example.svg" alt="test" />
    
    
        <p class="image-caption">The original image is at the top left; the remaining images show the red, green and blue channels. Original image source: pixabay.com.</p>
    
</div>

<p>We need to model these RGB values at each location. For this we use a <em>5-dimensional</em> GMM, where the variables are the \(x, y\) coordinates followed by the \(r, g, b\) values present at that coordinate. The following image shows how the training data is created - a notable difference from the black-and-white case is that <em>all</em> locations are used.</p>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/color_create_train.png" alt="test" />
    
    
        <p class="image-caption">In constrast to the black-and-white case, all pixels contribute to the data.</p>
    
</div>

<p>These GMMs are going to be larger because of both (a) a higher number of dimensions, and (b) a lot more points to model. I am also going to use a larger number of components since there is more information to be learned.</p>

<h3 id="sampling-1">Sampling</h3>
<p>So we have our 5-dimensional GMM now (trained with \(K=1000\) components this time). We can generate our first set of images by sampling as before: sample a point \((x, y, r, g, b)\), and color the pixel at location \((x,y)\) with the colors \((r,g,b)\). A minor practical detail here is that you would need to clip the \(r,g,b\) sampled values to lie in the right range - because the GMM, not knowing these are colors, might generate values beyond the valid range of <code class="language-plaintext highlighter-rouge">[0, 255]</code>.</p>

<p>But it works. Behold!</p>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/color/kingfisher_generated_staggered.png" alt="test" />
    
    
        <p class="image-caption">Original image source: pixabay.com</p>
    
</div>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/color/springbird_generated_staggered.png" alt="test" />
    
    
        <p class="image-caption">Original image source: pixabay.com</p>
    
</div>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/color/boat_generated_staggered.png" alt="test" />
    
    
        <p class="image-caption">Original image source: pixabay.com</p>
    
</div>

<h3 id="grid-plotting-1">Grid-plotting</h3>

<p>Remember we were able to generate images earlier by plotting points on a grid? How do we do that here? Well, the answer is a little complicated but doable.</p>

<p>Here’s the challenge: at a particular coordinate \((x_1, y_1)\), you can ask your GMM for a conditional <em>distribution</em>: \(p(r,g,b \vert x=x_1, y=y_1)\). Yes, you get a full-blown different GMM over \(r,g,b\) values at every coordinate (because, like we discussed, the conditional of a GMM is also a GMM). How do we color a location with a distribution? - well, here I just use its mean value. Effectively, given a specific coordinate \((x_1, y_1)\), I calculate the mean values for the \((r,g,b)\) tuple:</p>

\[\begin{aligned}
E[r, g, b \vert x=x_1, y=y_1]
\end{aligned}\]

<p>The below schematic shows this is what that looks like:</p>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/color_raster_scheme.png" alt="test" />
    
    
        <p class="image-caption">Grid plot strategy.</p>
    
</div>

<p>This is computationally more expensive than the black-and-white case. When you use the strategy on actual images, you end up with images with “smudgy brush strokes” - something you may have noticed with other AI-generated art. But the results are not bad at all!</p>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/color/kingfisher_raster_staggered.png" alt="test" />
    
    
        <p class="image-caption">Grid plot.</p>
    
</div>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/color/springbird_raster_staggered.png" alt="test" />
    
    
        <p class="image-caption">Grid plot.</p>
    
</div>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/color/boat_raster_staggered.png" alt="test" />
    
    
        <p class="image-caption">Grid plot.</p>
    
</div>

<h2 id="notes">Notes</h2>

<p>A list of miscellaneous points that I kept out of the main article to avoid cluttering.</p>
<ul>
  <li>I fixed a lot of the GMM parameters to some sweet-spot between the training being not memory intensive and the outputs being aesthetically pleasing. Of course, lots of tweaking possible here.</li>
  <li>Can we use discriminative classifiers? Yes - as an example, for the black-and-white case, given a location \((x, y)\) you could predict a label in \(\{0, 1\}\) for white and black colored pixels respectively. Here’s an example where I used <a href="https://lightgbm.readthedocs.io">LightGBM</a> as the classifier:</li>
</ul>

<!-- _includes/image.html -->
<div class="image-wrapper">
    
        <img src="/assets/fun_with_GMMs/discr_pixabay_butterfly.jpg" alt="test" />
    
    
        <p class="image-caption">Grid plot using a discriminative classifier.</p>
    
</div>

<ul>
  <li>Images were created using: <a href="https://matplotlib.org/">matplotlib</a>, <a href="https://seaborn.pydata.org/">seaborn</a>, <a href="https://inkscape.org/">Inkscape</a>, <a href="https://ezgif.com/">ezgif</a>, <a href="https://www.lcdf.org/gifsicle/">gifsicle</a>, <a href="https://shutter-project.org/">Shutter</a>.</li>
  <li>Thanks to <a href="https://pixabay.com">pixabay</a> for making their images available royalty-free. Specifically, I used these images:
    <ul>
      <li>I extracted black-and-white images out of the original color images of the <a href="https://pixabay.com/illustrations/animal-dog-heart-love-cartoon-pet-6987017/">Dog</a> and  <a href="https://pixabay.com/illustrations/butterfly-insect-wings-bug-2373175/">Butterfly</a>.</li>
      <li>The images in the color section are from here: <a href="https://pixabay.com/photos/kingfisher-bird-close-up-perched-2046453/">Kingfisher</a>, <a href="https://pixabay.com/photos/spring-bird-bird-tit-spring-blue-2295434/">Springbird</a>, <a href="https://pixabay.com/illustrations/boat-sea-ocean-rowing-wood-man-5404195/">Boat in ocean</a>.</li>
    </ul>
  </li>
  <li>As mentioned earlier, the image of “The Great Wave off Kanagawa” is from <a href="https://en.wikipedia.org/wiki/The_Great_Wave_off_Kanagawa">wikipedia</a>.</li>
  <li>For various properties of the Gaussian, Bishop’s book “Pattern Recognition and Machine Learning” is a commonly cited reference (<a href="https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf">PDF</a>). Properties of Gaussians can be found in Section 2.3 and GMMs are discussed in Section 9.2.</li>
</ul>

<p>This was a fun article to write - so much fun that I kept playing around with various parameter settings and broader creative ideas (sadly, mostly unfinished) to the point that this has been languishing as a draft for more than a year now! GMMs themselves are not used as much anymore, but I think of them as a good place to start if you want to understand how to use distributions for practical problems. GMMs aside, it is good to be familiar with the properties of Gaussians since they keep cropping up in various places such as <em>Probabilistic Circuits</em>, <em>Gaussian Processes</em>, <em>Bayesian Optimization</em> and <em>Variational Autoencoders</em>.</p>


  </div>

  
</article>

      </div>
    </main>


<!--
    <script src="https://giscus.app/client.js"
        data-repo="[ENTER REPO HERE]"
        data-repo-id="[ENTER REPO ID HERE]"
        data-category="[ENTER CATEGORY NAME HERE]"
        data-category-id="[ENTER CATEGORY ID HERE]"
        data-mapping="pathname"
        data-reactions-enabled="1"
        data-emit-metadata="0"
        data-theme="light"
        data-lang="en"
        crossorigin="anonymous"
        async>
    </script>
-->
    <footer class="site-footer">

  <div class="wrapper">

    <h2 class="footer-heading">A Not-So Primordial Soup</h2>

    <div class="footer-col-wrapper">
      <div class="footer-col footer-col-1">
        <ul class="contact-list">
          <li>
            
              A Not-So Primordial Soup
            
            </li>
            
        </ul>
      </div>

      <div class="footer-col footer-col-2">
        <ul class="social-media-list">
          

  	 <li>
            <a href="https://linkedin.com/in/abhishek-ghose-36197624">
            <i class="fa fa-linkedin"></i> LinkedIn
            </a>
         </li>
         

  	 <li>
            <a href="https://www.quora.com/profile/Abhishek-Ghose">
            <i class="fa fa-quora" aria-hidden="true"></i> Quora
            </a>
         </li>
         

        </ul>
      </div>

      <div class="footer-col footer-col-3">
        <p>My thought-recorder.</p>
      </div>
    </div>

  </div>

</footer>

    <script>
    
    var elements = document.querySelectorAll('p');
    Array.prototype.forEach.call(elements, function(el, i){
        if(el.innerHTML=='[expand]') {
            var parentcontent = el.parentNode.innerHTML.replace('<p>[expand]</p>','<div class="expand" style="display: none; height: 0; overflow: hidden;">').replace('<p>[/expand]</p>','</div>');
            el.parentNode.innerHTML = parentcontent;
        }
    });

    var elements = document.querySelectorAll('div.expand');
    Array.prototype.forEach.call(elements, function(el, i){
        el.previousElementSibling.innerHTML = el.previousElementSibling.innerHTML + '<span>..&nbsp; <a href="#" style="cursor: pointer;" onclick="this.parentNode.parentNode.nextElementSibling.style.display = \'block\'; this.parentNode.parentNode.nextElementSibling.style.height = \'auto\'; this.parentNode.style.display = \'none\';">read&nbsp;more&nbsp;&rarr;</a></span>';
    });

</script>
  </body>

</html>