Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Values jumping to infinity during sampling. #67

Open
12jerek34jeremi opened this issue Dec 7, 2024 · 0 comments
Open

Values jumping to infinity during sampling. #67

12jerek34jeremi opened this issue Dec 7, 2024 · 0 comments

Comments

@12jerek34jeremi
Copy link

Short describtion:
When you create a Glow model, train it on HSV images and, then, you sample from the model, during sampling values jump to infinity.

--------------------------------

Long describtion:
I am exploring Normalizing Flows. I wanted to create a model, based on the Glow architecture, and train it to generate images.

Almost everywhere in Computer Vision, whether it is image classification, object detection, or segmentation, you use RGB colorspace. The model's input is a RGB array of shape (3,H,W) or (H,W,3) with values ranging from 0.0 to 1.0. I don't know a lot about display technology and computer graphics, but, from what I know, RGB is so popular mainly because it has a direct representation on the screen, the hardware already operates in RGB. To display any HSV image on the screen, the image needs to be converted to RGB first. Although, HSV colorspace seems to be way easier, and way more intuitive. It separates color from intensity and brightness, making it easier to independently adjust hue, saturation, and value. I thought to try to train a model in HSV colorspace, maybe the model will have better results. It will not have to learn RGB color space. It will generate pixel brightness and pixel color separately, it will not have to compose color from complicated RGB components. It's worth trying.

I found out, that if you create a Glow model, train it on images in HSV colorspace, and, then, generate an image, using this model, values jump to infinity. During image generation values jump to infinity.

I started digging in, debug and so on... I found out that the problem is method 'forward' of class AffineCoupling see this file, lines 117 to 147 .

    def forward(self, z):
        """
        z is a list of z1 and z2; ```z = [z1, z2]```
        z1 is left constant and affine map is applied to z2 with parameters depending
        on z1

        Args:
          z
        """
        z1, z2 = z
        param = self.param_map(z1)
        if self.scale:
            shift = param[:, 0::2, ...]
            scale_ = param[:, 1::2, ...]
            if self.scale_map == "exp":
                z2 = z2 * torch.exp(scale_) + shift
                log_det = torch.sum(scale_, dim=list(range(1, shift.dim())))
            elif self.scale_map == "sigmoid":
                scale = torch.sigmoid(scale_ + 2)
                z2 = z2 / scale + shift
                log_det = -torch.sum(torch.log(scale), dim=list(range(1, shift.dim())))
            elif self.scale_map == "sigmoid_inv":
                scale = torch.sigmoid(scale_ + 2)
                z2 = z2 * scale + shift
                log_det = torch.sum(torch.log(scale), dim=list(range(1, shift.dim())))
            else:
                raise NotImplementedError("This scale map is not implemented.")
        else:
            z2 = z2 + param
            log_det = zero_log_det_like_z(z2)
        return [z1, z2], log_det

param_map is feed-forward network. z2 is divided by sigmoid(param_map(z1)[:, 0::2, ...] + 2) If params_map in one given dimension generated a negative number of quite big magninute, like -20.0, z1 is divided by $σ(-20.0 + 2.0) = 0.0000000152$ . z2 is multiplied by $65 659 964$ ($1 \over 0.0000000152$) in this dimension. It is enough that params map generated low value like this in 5 consecutive layers for one given dimension, then z jumps to infinity in this dimension.

It is shown in this notebook (link) . In the notebook I show that

  1. When you
    a. Create new glow model,
    b. don't train it,
    c. sample from the model.
    Then everything is fine. Without any training, sampling works as it should. Values of z don't jump to infinity.

  2. When you
    a. Create new glow model,
    b. train it on RGB data with values ranging from 0.0 to 1.0,
    c. sample from the model,
    then everything is fine. After training on rgb 0-1 data with noise, sampling works as it should. Values of z don't jump to infinity.

  3. When you
    a. Create new glow model,
    b. train it on hsv data with values ranging from 0.0 to 1.0,
    c. sample from the model,
    then values jump to infinity during sampling. After training on hsv data, sampling doesn't work anymore. Values jump to infinity!

It raises 2 questions.

  1. Why does it happen after training on HSV data?
  2. Why doesn't it happen after training on RGB data?

I looked on histograms of each channel of both colorspaces of exemplary images. (histogram of R (red) channel, of G (green) channel, of B (blue) channel, of H (hue) channel, of S (saturation) channel and of V (value) channel) I found out that, distribution on R,G,B, S, and V channels usually look either sth like gausian distribution or uniform distribution, but distribution of H channel very often resemble bimodal distribution, with peaks somewhere at 0.1 and 0.9 and lowest value at 0.5.
image
image
image
image

You can see other histograms in this notebook (link)

In Glow multiscale model, when using diagonal gausian distribution, each dimension (each channel of each pixel) in latent space has it's own gausian distribution with its own mean and standard deviation. The simplest way to "transform" bimodal distribution to normal distribution would be just multiplication by very small positive number, like 0.00001, and setting standard deviation close to zero, and mean to 0.0. It seems like exactly that is happening. Model learns to generate very low numbers for some dimension in params map, so sigmoid of this number is close to zero. Unfortunately when going the opposite direction, going inverse, sampling, instead of multiplying by positive number close to zero, we divide by this number. This simple way, after training, during sampling, value of z in some dimension jumps to infinity. Since, param_map is convolutional neural network, in which output at given pixel depends on value of this pixel and it's neighbours, "infinity spreads to all dimensions". Of course that's just a theory, one possible explanation. I don't know how to prove or disprove it.

Why am I writing all this? Well, scientific progress is possible because a given work builds on previous work, and future work can build on that work. Maybe if somebody else come up with idea to use HSV in image modeling with Normalizing Flows, it will read this post first and will not waste his time as I did. 😄 😄 I didn't know any better place to write it.

Conclusions:

  1. RGB is most likely better than HSV for Normalising Flows.

  2. Wouldn't it better to change

 scale = torch.sigmoid(scale_ + 2)

to for example

scale =  torch.sigmoid(scale_) * 3.75 + 0.25

This way z can be multiplied maximally by $4$ and minimally by $1 \over 4$. When inverting, it can be divided minimally by $1 \over 4$ and maximally by $4$, which is equivalent to multiplying by $4$ or $1 \over 4$ . Why is it torch.sigmoid(scale_ + 2) anyway? Is there any specific reason for that? Why 2 is inside, not outside sigmoid? The best would be, if AffineCoupling would take a callable as a parameter. That would allow for maximum flexibility.
Normalising Flows are to transform any distribution into normal distribution. If model crashes after seeing bimodal distribution, maybe something needs to be change in the architecture.

  1. Why GlowBlock allows only to pass nr of hidden channels to __init__? Why can't user create it's own module for param_map and pass it to __init__ of GlowBlock? Current implementation limits user to using convolutional neural network with 2 hidden layers in each GlowBlock. Passing callable would allow for way more flexibility.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant