Skip to content

Latest commit

 

History

History
307 lines (229 loc) · 12.4 KB

File metadata and controls

307 lines (229 loc) · 12.4 KB

Pytorch_cnn_visualization_implementations

This repository including most of cnn visualizations techniques using pytorch

Feature map visualization

In this technique, we can directly visualize intermediate feature map via one forward pass. In the following illustrations, we use pre-trained vgg16 model, and output layer_1, layer_6, layer_15, layer_29 respectively.

Source code: vis_feature_map.py

Kernels visualization

We can also directly visualize raw convolutional filter weights. This method is suitable for the first convolutional kernel, the results show that the first layer can learn simple features, such as edge, colored blobs. Although we can visualize raw filter weights at higher layers, but it doesn't make any sense.

Source code: vis_filter.py

AlexNet
ResNet50
DenseNet121

Saliency map

Saliency map, also known as post-hoc attention, it includes three closely related methods for creating saliency map:

All these methods produce visualizations to show which inputs a neural network is using to make a particular prediction.

The common idea is to compute the gradient of the prediction score with respect to the input pixels, and keep the weights fixed. This determines the importance of the corresponding pixels of input images for the specific class.

The principle behind saliency map is that, in the case of deep ConvNets, the class score $S_c(I)$ is a highly non-linear function of $I$,

$$ S_c(I)=w_c^TI+b_c $$

Given an image $I_0$, we can approximate $S_c(I)$ with a linear function in the neighborhood of $I_0$ by computing the first-order Taylor expansion: $$ S_c(I) \approx w^TI+b $$

where $w$ is the derivative of $S_c$ with respect to the image $I$ at the point $I_0$:

$$ w=\frac{\partial{S_c}}{\partial{I}}|_{I=I_0} $$

So saliency map can be thought as the weights importance matrix with respect to input image pixels.

The following figure show the only difference of these three methods when back propagate through ReLU module.

Original image
vanilla backpropagation (color image)
vanilla backpropagation (gray image)
guided backpropagation (color image)
guided backpropagation (gray image)
deconv backpropagation (color image)
deconv backpropagation (gray image)

Gradient Ascent

In this technique, we generate a synthetic image that maximally activates a neuron, the objective function is as follows:
$$ argmax\ (S_c(I)-\lambda||I||_2^2) $$ Where $I$ is input image, we initialize $I=0$ first, and then repeat the following three steps until convergence or satisfy the maximum number of iteration rounds:

  • Pass image $I$ to model, and compute specific class scores $S_c(I)$

  • Calculate objective loss, and back propagate to get gradient with respect to image pixels

  • Make a small update to image

Paper: Gradient Ascent - arXiv 2013

In the following schematic diagram, we visualize three different classes, corresponding to no regularization, L1 regularization and L2 regularization.

Source code: gradient_ascent_specific_class.py

No Regularization L1 Regularization L2 Regularization
class=52 (蛇)
class=77 (蜘蛛)
class=231 (牧羊犬)

We can also use gradient ascent technique to visualize intermediate layer (not model output), the only difference is that, we compute the mean of specific filter weights, we can rewrite this new objective function as follows:

$$ argmax\ (M_{ij}(I)-\lambda||I||_2^2) $$

where $M_{ij}(I)$ represents the means of filter $j$ of layer $i$ .

Source code: gradient_ascent_intermediate_layer.py

layer=12, filter=5
layer=24, filter=25

Deep Dream

Deep dream is also using gradient ascent to show visualization, the only difference is that, the input image is a real image, not random input.

Here, we use pretrained VGG19 model, and replace random image with a real image, we choose layer 34, the following figures show the results.

Source code: deep_dream.py

Original image
deep dream (one channel: layer=34, filter=45)
deep dream (all channel: layer=34)

Although it works, but the quality can be improved by pyramid reconstruction.

Source code: deep_dream_improved.py

This code reference this project : eriklindernoren/PyTorch-Deep-Dream

Original image
deep dream (one channel: layer=34, filter=45)
deep dream (all channel: layer=34)

Grad CAM

Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target in a classification network flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept.

Source code: grad_cam.py

This code reference this project : jacobgil/pytorch-grad-cam

Original image
layer=35
heat_map