Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning #67

howardyclo · 2020-03-07T09:06:05Z

Metadata

Authors: Yarin Gal & Zoubin Ghahramani
Organization: University of Cambridge
Conference: ICML 2016
Paper: https://arxiv.org/pdf/1506.02142.pdf
Appendix: https://arxiv.org/pdf/1506.02157.pdf (In-depth review of dropout, Gaussian processes and variational inference)
Code: https://github.com/yaringal/DropoutUncertaintyExps

howardyclo · 2020-03-07T09:06:11Z

Model uncertainty

Model or epistemic uncertainty captures uncertainty in the model parameters. It is higher in regions of no or little training data and lower in regions of more training data. Therefore, model uncertainty can be explained away given enough training data.

Bayesian Neural Network

Place a prior distribution (e.g., Gaussian) over model weights w. By Bayes rule, we can have a posterior distribution model weights p(w|D), instead of a point estimate of w.
Bayesian prediction: p(y*|x*, D) = E_{w~p(w|D)}[p(y*|x*, w)] = ∫p(y*|x*, w)p(w|D)dw (i.e., marginalize over the posterior or so-called marginalization and Bayesian model averaging). Practically the integration cannot be compute exactly. Therefore we sample several ws from p(w|D) and perform averaging (i.e., "ensembles" or so-called approximated Bayesian marginalization).
Bayesian inference: computing analytical solution of p(w|D) is intractable, we can approximate p(w|D) using variational inference (i.e., minimize KL(q_θ(w|D) || p(w|D)).]
Advantages: Robustness to over-fitting, model uncertainty quantification.
Disadvantages: Computational expensive (need variational inference to learn parameters), number of parameters doubles (learning a parameter becomes learning its mean and variance) and more time to converge.

Mathematical Findings: Dropout as Bayesian Approximation

A neural network with arbitrary depth and non-linearities, with dropout applied before every weight layer, is mathematically equivalent to an approximation to the probabilistic deep Gaussian process (marginalized over its covariance function parameters). Note: Gaussian processes (GP) model distributions over functions. The findings carry to other variants of dropout as well (e.g., drop-connect, multiplicative Gaussian noise).

Dropout objective minimizes KL-divergence between an approximate distribution and the posterior of a deep Gaussian process (marginalized over its finite rank covariance function parameters) (i.e., Dropout objective is as same as variational inference!).

A deep GP can be approximated by placing variational distribution (i.e., the approximated distribution q(w) to the posterior distribution p(w|X, Y)) over each component of a spectral decomposition of the GP's covariance functions. This spectral decomposition maps each layer of the deep GP to a layer of explicitly represented hidden unit.

Obtaining Model Uncertainty by "MC Dropout"

Performing T stochastic (dropout-enabled) forward passes through the network, we can get the variance of T predictions as model uncertainty; the average of T predictions can be viewed as an ensembled prediction.

Determining the best dropout rate

The best dropout rate can be simply done in grid search, where you make average predictions from MC dropout with different dropout rates. The best dropout rate corresponds to the best average prediction (please refer to the code).

Different non-linearities result in different uncertainty estimates

Dropout’s uncertainty draws its properties from the GP in which different covariance functions correspond to different uncertainty estimates. ReLU and Tanh approximate different GP covariance functions (See appendix 3.1).

howardyclo added Deep Learning ICML Bayesian Uncertainty labels Mar 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning #67

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning #67

howardyclo commented Mar 7, 2020

howardyclo commented Mar 7, 2020 •

edited

Loading

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning #67

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning #67

Comments

howardyclo commented Mar 7, 2020

Metadata

howardyclo commented Mar 7, 2020 • edited Loading

Model uncertainty

Bayesian Neural Network

Mathematical Findings: Dropout as Bayesian Approximation

Obtaining Model Uncertainty by "MC Dropout"

Determining the best dropout rate

Different non-linearities result in different uncertainty estimates

howardyclo commented Mar 7, 2020 •

edited

Loading