You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Short describtion:
When you create a Glow model, train it on HSV images and, then, you sample from the model, during sampling values jump to infinity.
--------------------------------
Long describtion:
I am exploring Normalizing Flows. I wanted to create a model, based on the Glow architecture, and train it to generate images.
Almost everywhere in Computer Vision, whether it is image classification, object detection, or segmentation, you use RGB colorspace. The model's input is a RGB array of shape (3,H,W) or (H,W,3) with values ranging from 0.0 to 1.0. I don't know a lot about display technology and computer graphics, but, from what I know, RGB is so popular mainly because it has a direct representation on the screen, the hardware already operates in RGB. To display any HSV image on the screen, the image needs to be converted to RGB first. Although, HSV colorspace seems to be way easier, and way more intuitive. It separates color from intensity and brightness, making it easier to independently adjust hue, saturation, and value. I thought to try to train a model in HSV colorspace, maybe the model will have better results. It will not have to learn RGB color space. It will generate pixel brightness and pixel color separately, it will not have to compose color from complicated RGB components. It's worth trying.
I found out, that if you create a Glow model, train it on images in HSV colorspace, and, then, generate an image, using this model, values jump to infinity. During image generation values jump to infinity.
I started digging in, debug and so on... I found out that the problem is method 'forward' of class AffineCoupling see this file, lines 117 to 147 .
defforward(self, z):
""" z is a list of z1 and z2; ```z = [z1, z2]``` z1 is left constant and affine map is applied to z2 with parameters depending on z1 Args: z """z1, z2=zparam=self.param_map(z1)
ifself.scale:
shift=param[:, 0::2, ...]
scale_=param[:, 1::2, ...]
ifself.scale_map=="exp":
z2=z2*torch.exp(scale_) +shiftlog_det=torch.sum(scale_, dim=list(range(1, shift.dim())))
elifself.scale_map=="sigmoid":
scale=torch.sigmoid(scale_+2)
z2=z2/scale+shiftlog_det=-torch.sum(torch.log(scale), dim=list(range(1, shift.dim())))
elifself.scale_map=="sigmoid_inv":
scale=torch.sigmoid(scale_+2)
z2=z2*scale+shiftlog_det=torch.sum(torch.log(scale), dim=list(range(1, shift.dim())))
else:
raiseNotImplementedError("This scale map is not implemented.")
else:
z2=z2+paramlog_det=zero_log_det_like_z(z2)
return [z1, z2], log_det
param_map is feed-forward network. z2 is divided by sigmoid(param_map(z1)[:, 0::2, ...] + 2) If params_map in one given dimension generated a negative number of quite big magninute, like -20.0, z1 is divided by $σ(-20.0 + 2.0) = 0.0000000152$ . z2 is multiplied by $65 659 964$ ($1 \over 0.0000000152$) in this dimension. It is enough that params map generated low value like this in 5 consecutive layers for one given dimension, then z jumps to infinity in this dimension.
When you
a. Create new glow model,
b. don't train it,
c. sample from the model.
Then everything is fine. Without any training, sampling works as it should. Values of z don't jump to infinity.
When you
a. Create new glow model,
b. train it on RGB data with values ranging from 0.0 to 1.0,
c. sample from the model,
then everything is fine. After training on rgb 0-1 data with noise, sampling works as it should. Values of z don't jump to infinity.
When you
a. Create new glow model,
b. train it on hsv data with values ranging from 0.0 to 1.0,
c. sample from the model,
then values jump to infinity during sampling. After training on hsv data, sampling doesn't work anymore. Values jump to infinity!
It raises 2 questions.
Why does it happen after training on HSV data?
Why doesn't it happen after training on RGB data?
I looked on histograms of each channel of both colorspaces of exemplary images. (histogram of R (red) channel, of G (green) channel, of B (blue) channel, of H (hue) channel, of S (saturation) channel and of V (value) channel) I found out that, distribution on R,G,B, S, and V channels usually look either sth like gausian distribution or uniform distribution, but distribution of H channel very often resemble bimodal distribution, with peaks somewhere at 0.1 and 0.9 and lowest value at 0.5.
In Glow multiscale model, when using diagonal gausian distribution, each dimension (each channel of each pixel) in latent space has it's own gausian distribution with its own mean and standard deviation. The simplest way to "transform" bimodal distribution to normal distribution would be just multiplication by very small positive number, like 0.00001, and setting standard deviation close to zero, and mean to 0.0. It seems like exactly that is happening. Model learns to generate very low numbers for some dimension in params map, so sigmoid of this number is close to zero. Unfortunately when going the opposite direction, going inverse, sampling, instead of multiplying by positive number close to zero, we divide by this number. This simple way, after training, during sampling, value of z in some dimension jumps to infinity. Since, param_map is convolutional neural network, in which output at given pixel depends on value of this pixel and it's neighbours, "infinity spreads to all dimensions". Of course that's just a theory, one possible explanation. I don't know how to prove or disprove it.
Why am I writing all this? Well, scientific progress is possible because a given work builds on previous work, and future work can build on that work. Maybe if somebody else come up with idea to use HSV in image modeling with Normalizing Flows, it will read this post first and will not waste his time as I did. 😄 😄 I didn't know any better place to write it.
Conclusions:
RGB is most likely better than HSV for Normalising Flows.
Wouldn't it better to change
scale=torch.sigmoid(scale_+2)
to for example
scale=torch.sigmoid(scale_) *3.75+0.25
This way z can be multiplied maximally by $4$ and minimally by $1 \over 4$. When inverting, it can be divided minimally by $1 \over 4$ and maximally by $4$, which is equivalent to multiplying by $4$ or $1 \over 4$ . Why is it torch.sigmoid(scale_ + 2) anyway? Is there any specific reason for that? Why 2 is inside, not outside sigmoid? The best would be, if AffineCoupling would take a callable as a parameter. That would allow for maximum flexibility.
Normalising Flows are to transform any distribution into normal distribution. If model crashes after seeing bimodal distribution, maybe something needs to be change in the architecture.
Why GlowBlock allows only to pass nr of hidden channels to __init__? Why can't user create it's own module for param_map and pass it to __init__ of GlowBlock? Current implementation limits user to using convolutional neural network with 2 hidden layers in each GlowBlock. Passing callable would allow for way more flexibility.
The text was updated successfully, but these errors were encountered:
Short describtion:
When you create a Glow model, train it on HSV images and, then, you sample from the model, during sampling values jump to infinity.
--------------------------------
Long describtion:
I am exploring Normalizing Flows. I wanted to create a model, based on the Glow architecture, and train it to generate images.
Almost everywhere in Computer Vision, whether it is image classification, object detection, or segmentation, you use RGB colorspace. The model's input is a RGB array of shape (3,H,W) or (H,W,3) with values ranging from 0.0 to 1.0. I don't know a lot about display technology and computer graphics, but, from what I know, RGB is so popular mainly because it has a direct representation on the screen, the hardware already operates in RGB. To display any HSV image on the screen, the image needs to be converted to RGB first. Although, HSV colorspace seems to be way easier, and way more intuitive. It separates color from intensity and brightness, making it easier to independently adjust hue, saturation, and value. I thought to try to train a model in HSV colorspace, maybe the model will have better results. It will not have to learn RGB color space. It will generate pixel brightness and pixel color separately, it will not have to compose color from complicated RGB components. It's worth trying.
I found out, that if you create a Glow model, train it on images in HSV colorspace, and, then, generate an image, using this model, values jump to infinity. During image generation values jump to infinity.
I started digging in, debug and so on... I found out that the problem is method 'forward' of class AffineCoupling see this file, lines 117 to 147 .
param_map is feed-forward network. z2 is divided by$σ(-20.0 + 2.0) = 0.0000000152$ . z2 is multiplied by $65 659 964$ ($1 \over 0.0000000152$ ) in this dimension. It is enough that params map generated low value like this in 5 consecutive layers for one given dimension, then z jumps to infinity in this dimension.
sigmoid(param_map(z1)[:, 0::2, ...] + 2)
If params_map in one given dimension generated a negative number of quite big magninute, like -20.0, z1 is divided byIt is shown in this notebook (link) . In the notebook I show that
When you
a. Create new glow model,
b. don't train it,
c. sample from the model.
Then everything is fine. Without any training, sampling works as it should. Values of z don't jump to infinity.
When you
a. Create new glow model,
b. train it on RGB data with values ranging from 0.0 to 1.0,
c. sample from the model,
then everything is fine. After training on rgb 0-1 data with noise, sampling works as it should. Values of z don't jump to infinity.
When you
a. Create new glow model,
b. train it on hsv data with values ranging from 0.0 to 1.0,
c. sample from the model,
then values jump to infinity during sampling. After training on hsv data, sampling doesn't work anymore. Values jump to infinity!
It raises 2 questions.
I looked on histograms of each channel of both colorspaces of exemplary images. (histogram of R (red) channel, of G (green) channel, of B (blue) channel, of H (hue) channel, of S (saturation) channel and of V (value) channel) I found out that, distribution on R,G,B, S, and V channels usually look either sth like gausian distribution or uniform distribution, but distribution of H channel very often resemble bimodal distribution, with peaks somewhere at 0.1 and 0.9 and lowest value at 0.5.
You can see other histograms in this notebook (link)
In Glow multiscale model, when using diagonal gausian distribution, each dimension (each channel of each pixel) in latent space has it's own gausian distribution with its own mean and standard deviation. The simplest way to "transform" bimodal distribution to normal distribution would be just multiplication by very small positive number, like 0.00001, and setting standard deviation close to zero, and mean to 0.0. It seems like exactly that is happening. Model learns to generate very low numbers for some dimension in params map, so sigmoid of this number is close to zero. Unfortunately when going the opposite direction, going inverse, sampling, instead of multiplying by positive number close to zero, we divide by this number. This simple way, after training, during sampling, value of z in some dimension jumps to infinity. Since, param_map is convolutional neural network, in which output at given pixel depends on value of this pixel and it's neighbours, "infinity spreads to all dimensions". Of course that's just a theory, one possible explanation. I don't know how to prove or disprove it.
Why am I writing all this? Well, scientific progress is possible because a given work builds on previous work, and future work can build on that work. Maybe if somebody else come up with idea to use HSV in image modeling with Normalizing Flows, it will read this post first and will not waste his time as I did. 😄 😄 I didn't know any better place to write it.
Conclusions:
RGB is most likely better than HSV for Normalising Flows.
Wouldn't it better to change
to for example
This way z can be multiplied maximally by$4$ and minimally by $1 \over 4$ . When inverting, it can be divided minimally by $1 \over 4$ and maximally by $4$ , which is equivalent to multiplying by $4$ or $1 \over 4$ . Why is it
torch.sigmoid(scale_ + 2)
anyway? Is there any specific reason for that? Why 2 is inside, not outside sigmoid? The best would be, ifAffineCoupling
would take a callable as a parameter. That would allow for maximum flexibility.Normalising Flows are to transform any distribution into normal distribution. If model crashes after seeing bimodal distribution, maybe something needs to be change in the architecture.
GlowBlock
allows only to pass nr of hidden channels to__init__
? Why can't user create it's own module for param_map and pass it to__init__
ofGlowBlock
? Current implementation limits user to using convolutional neural network with 2 hidden layers in each GlowBlock. Passing callable would allow for way more flexibility.The text was updated successfully, but these errors were encountered: