SmoothTaylor is a gradient-based attribution method derived from the Taylor's theorem for deep neural network attribution. It is proposed as a theoretical bridge between SmoothGrad (Smilkov et al.) and Integrated Gradients (Sundararajan et al.).
In our paper, we conduct experiments to compare the performance of SmoothTaylor and Integrated Gradients using empirical quantitative measures: perturbations scores and average total variation, and show that SmoothTaylor is able to generate attribution maps that are smoother and more sensitive.
This repository includes a PyTorch implementation of SmoothTaylor, SmoothGrad and Integrated Gradients.
Goh, S. W. Goh, S. Lapuschkin, L. Weber, W. Samek, and A. Binder (2021). “Understanding Integrated Gradients with SmoothTaylor for Deep Neural Network Attribution”. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 4949–4956. DOI:10.1109/ICPR48806.2021.9413242.
Links: Paper • Presentation • Poster
Required Python (version 3.7) with standard libraries and following packages version requirements (tested for execution):
- pytorch 1.4.0
- torchvision 0.5.0
- scikit-image 0.16.2
- pillow 7.0.0
- numpy 1.7.14
- scipy 1.4.1
- tqdm 4.36.1
Tested in Ubuntu + Intel i7-6700 CPU + RTX 2080 Ti with Cuda (10.1). CPU-only mode also possible, but running with GPU is highly recommended.
The first 1000 images of the ILSVRC2012 ImageNet object recognition validation dataset is used in our paper's experiment. To replicate our experiment using our experiment code, download or place the dataset into a new folder ./data
, and put the annotations file (in .xml
formats) in subfolder ./data/annotations
and the raw images in subfolder ./data/images
. Note: required resource files and pre-processing steps for ImageNet are already provided in ./rsc
and in ./attribution/constants.py
.
# the ILSVRC2012 ImageNet validation dataset structure should be placed like this
data/
-annotations/
-ILSVRC2012_val_{img#}.xml
-images/
-ILSVRC2012_val_{img#}.JPEG
You may also create your own dataset using the PyTorch's torch.utils.data.Dataset
wrapper to use in your own code.
In our experiment, we applied attribution on the following deep neural image classifiers:
- DenseNet121 (Huang et al., 2017), and;
- ResNet152 (He et al., 2015).
They are both pretrained on the ILSVRC2012 ImageNet dataset, and we use the instance in the default torchvision
url paths. You may use other pretrained image classifier models that are implemented in PyTorch. Just remember to add the name and instance in the MODELS
dictionary map in ./attribution/constants.py
.
# the current default mapping is:
from torchvision import models
MODELS = {'densenet121': models.densenet121,
'resnet152': models.resnet152}
To replicate our experiments, please follow the steps in this section.
-
First, save the classification outputs of the images using the pre-trained image classifiers:
python experiment_classify.py [-m MODEL_NAME] [-b BATCH_SIZE]
Arguments:
-m MODEL_NAME
: usedensenet121
orresnet152
-b BATCH_SIZE
(optional): number of image per epoch (default: 128)
The classification output should be saved in a new folder
./output/[MODEL_NAME]
. -
Perform the neural network attribution. We implemented 3 gradient-based attribution methods here:
-
SmoothTaylor
python experiment_smooth_taylor.py [-m MODEL_NAME] [-b BATCH_SIZE] [-s NOISE_SCALE] [-r NUM_ROOTS]
Arguments:
-m MODEL_NAME
: usedensenet121
orresnet152
-b BATCH_SIZE
(optional): number of image per epoch (default: 50)-s NOISE_SCALE
(optional): magnitude of the noise scale to noise the image (default: 5e-1)-n NUM_ROOTS
(optional): number of noise inputs to use (default: 150)
-
IntegratedGradients
python experiment_ig.py [-m MODEL_NAME] [-b BATCH_SIZE] [-k STEPS] [-z BASELINE_TYPE] [-n NUM_NOISE]
Arguments:
-m MODEL_NAME
: usedensenet121
orresnet152
-b BATCH_SIZE
(optional): number of image per epoch (default: 50)-k STEPS
(optional): number of steps along path (default: 50)-z BASELINE_TYPE
(optional): baseline type [usezero
ornoise
] (default:zero
)-n NUM_NOISE
(optional): number of noise baselines to use (default: 1)
-
SmoothGrad
python experiment_grad.py [-m MODEL_NAME] [-b BATCH_SIZE] [-s] [-p NOISE_SCALE] [-n NUM_NOISE]
Arguments:
-m MODEL_NAME
: usedensenet121
orresnet152
-b BATCH_SIZE
(optional): number of image per epoch (default: 50)-s
(optional): to use SmoothGrad or not (default:False
)-p NOISE_SCALE
(optional): percentage noise scale (default: 15)-n NUM_NOISE
(optional): number of noise inputs to use (default: 50)
The heatmaps should be saved in a new folder
./heatmaps
, with hyperparameter values as subfolders names e.g../heatmaps/[ATTRIBUTION_METHOD]/[MODEL_NAME]/...
-
-
Evaluate the attribution methods by comparing their heatmaps, using two quantitative evaluation metrics:
-
Perturbation Scores for sensitivity
python experiment_perturbations.py [-m MODEL_NAME] [-a ANALYZER] [-b BATCH_SIZE] [-z BASELINE] [-n NUM_NOISE] [-s NOISE_SCALE] [-r NUM_ROOTS] [-k KERNEL_SIZE] [-pt NUM_PERTURBS] [-l NUM_REGIONS] [-an] [-af ADAPTIVE_FUNCTION]
Arguments:
-m MODEL_NAME
: usedensenet121
orresnet152
-a ANALYZER
: attribution method [usegrad
,smooth-grad
,smooth-taylor
, orig
]-b BATCH_SIZE
(optional): number of image per epoch (default: 50)-z BASELINE
(optional): IG baseline used [usezero
ornoise
] (default:zero
)-n NUM_NOISE
(optional): number of noised baseline in IG (default: 1)-s NOISE_SCALE
(optional): magnitude of noise scale for smoothing (default: 5e-1)-r NUM_ROOTS
(optional): number of noise inputs for smoothing (default: 150)-k KERNEL_SIZE
(optional): size of the window of each perturbation (default: 15)-pt NUM_PERTURBS
(optional): number of random perturbations to evaluate (default: 50)-l NUM_REGIONS
(optional): number of regions to perturbate (default: 30)-an
(optional): use adaptive noise (default:False
)-af ADAPTIVE_FUNCTION
(optional): objective function for adaptive noising [useaupc
orautvc
] (default:aupc
)
-
Average Total Variation for noisiness
python experiment_total_variation.py [-m MODEL_NAME] [-a ANALYZER] [-z BASELINE] [-n NUM_NOISE] [-s NOISE_SCALE] [-r NUM_ROOTS] [-ds DOWNSCALE] [-wms WIDTH_MIN_SIZE] [-hms HEIGHT_MIN_SIZE] [-lp LP_NORM] [-an] [-af ADAPTIVE_FUNCTION]
Arguments:
-m MODEL_NAME
: usedensenet121
orresnet152
-a ANALYZER
: attribution method [usegrad
,smooth-grad
,smooth-taylor
, orig
]-z BASELINE
(optional): IG baseline used [usezero
ornoise
] (default: zero)-n NUM_NOISE
(optional): number of noised baseline in IG (default: 1)-s NOISE_SCALE
(optional): magnitude of noise scale for smoothing (default: 5e-1)-r NUM_ROOTS
(optional): number of noise inputs for smoothing (default: 150)-ds DOWNSCALE
(optional): factor to downscale heatmap (default: 1.5)-wms WIDTH_MIN_SIZE
(optional): minimum width for downscale (default: 30)-hms HEIGHT_MIN_SIZE
(optional): minimum height for downscale (default: 30)-lp LP_NORM
(optional): norm to use to calculate total variation (default: 1)-an
(optional): use adaptive noise (default:False
)-af ADAPTIVE_FUNCTION
(optional): objective function for adaptive noising [useaupc
orautvc
] (default:aupc
)
-
-
Generate SmoothTaylor heatmaps with adaptive noising hyperparameter tuning technique:
python experiment_adaptive_noising.py [-m MODEL_NAME] [-b BATCH_SIZE] [-r NUM_ROOTS] [-f OBJ_FUNCTION] [-ds DOWNSCALE] [-wms WIDTH_MIN_SIZE] [-hms HEIGHT_MIN_SIZE] [-lp LP_NORM] [-k KERNEL_SIZE] [-p NUM_PERTURBS] [-l NUM_REGIONS] [-lr LEARNING_RATE] [-y LEARNING_DECAY] [-c MAX_STOP_COUNT] [-x MAX_ITERATION]
Arguments:
-m MODEL_NAME
: usedensenet121
orresnet152
-b BATCH_SIZE
(optional): number of image per epoch (default: 50)-r NUM_ROOTS
(optional): number of noise inputs for smoothing (default: 150)-f OBJ_FUNCTION
(optional): objective function for adaptive noising [useaupc
orautvc
] (default:aupc
)-ds DOWNSCALE
(optional): factor to downscale heatmap (default: 1.5)-wms WIDTH_MIN_SIZE
(optional): minimum width for downscale (default: 30)-hms HEIGHT_MIN_SIZE
(optional): minimum height for downscale (default: 30)-lp LP_NORM
(optional): norm to use to calculate total variation (default: 1)-k KERNEL_SIZE
(optional): size of the window of each perturbation (default: 15)-p NUM_PERTURBS
(optional): number of random perturbations to evaluate (default: 50)-l NUM_REGIONS
(optional): number of regions to perturbate (default: 30)-lr LEARNING_RATE
(optional): learning rate for variable update (default: 0.1)-y LEARNING_DECAY
(optional): decay rate of learning rate (default: 0.9)-c MAX_STOP_COUNT
(optional): maximum stop count to terminate search (default: 3)-x MAX_ITERATION
(optional): maximum iterations to search (default: 20)
Perform evaluation (see Step 2 above) if required.
For clearer explanations to what each hyperparameter in the arguments mean, please refer to our paper.
This work is licensed under MIT License. See LICENSE for details.
If you find our code or paper useful, please cite our paper:
@inproceedings{GohLWSB21Understanding,
author = {Goh, Gary S. W. and Lapuschkin, Sebastian and Weber, Leander and Samek, Wojciech and Binder, Alexander},
title = {Understanding Integrated Gradients with SmoothTaylor for Deep Neural Network Attribution},
booktitle = {2020 25th International Conference on Pattern Recognition, (ICPR)},
pages = {4949--4956},
publisher = {IEEE},
year = {2021},
address = {Virtual Event / Milan, Italy},
doi = {10.1109/ICPR48806.2021.9413242},
arxiv = {2004.10484}
}
If you found any bugs, or have any questions, please email to [email protected].