+ - 0:00:00
Notes for current slide
Notes for next slide

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

15.2 Towards deep learning for the real world


Andrei Bursuc

1/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Towards deep learning for the real world

i.e., beyond cats and dogs


Andrei Bursuc

2/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Motivation

3/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Deep Learning is great:

  • conceptually simple and modular
  • scales well with data
  • awesome software tools
  • huge community and interest
  • potentially real world impact
4/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Deep Learning is great:

  • conceptually simple and modular
  • scales well with data
  • awesome software tools
  • huge community and interest
  • potentially real world impact

... but has several problems

  • uninterpretable black-boxes
  • needs a lot of data
  • mostly empirical
  • what does a model not know?
  • can be fooled easily
4/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

The world is a complex environment

Covering this diversity with (sufficient) data and labels is highly challenging

5/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Dealing with uncertainty

6/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Why should I care about uncertainty?

7/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Motivation

In May 2016, there was the first fatality from an assisted driving system, caused by the perception system confusing the white side of a trailer for bright sky.

8/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Motivation

An image classification system erroneously identifies two African Americans as gorillas, raising concerns of racial discrimination.

9/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

What do we mean by uncertainty?

10/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

What do we mean by uncertainty?

Return a distribution over predictions instead of a single prediction:

  • classification: output a label and its confidence
  • regression: output a mean and a variance

11/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Good uncertainty estimates tell us when we can trust the predictions of our model.

12/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

What do we mean by Out-of-Distribution Robustness?

I.I.D  ptest(x,y)=ptrain(x,y)\text{ }p_{\text{test}}(\mathbf{x}, y) = p_{\text{train}}(\mathbf{x}, y)
(I.I.D. = Indepedent and Identically Distributed)

O.O.D  ptest(x,y)≠ ptrain(x,y)\text{ }p_{\text{test}}(\mathbf{x}, y) =\not\ p_{\text{train}}(\mathbf{x}, y)

13/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

What do we mean by Out-of-Distribution Robustness?

I.I.D  ptest(x,y)=ptrain(x,y)\text{ }p_{\text{test}}(\mathbf{x}, y) = p_{\text{train}}(\mathbf{x}, y)
(I.I.D. = Indepedent and Identically Distributed)

O.O.D  ptest(x,y)≠ ptrain(x,y)\text{ }p_{\text{test}}(\mathbf{x}, y) =\not\ p_{\text{train}}(\mathbf{x}, y)

Examples of dataset shift:

  • covariate shift: distribution of features p(x)p(\mathbf{x}) changes and p(yx)p(y \vert \mathbf{x}), i.e., labels, is fixed
  • open-set recognition: new classes may appear at test time
  • label shift: distribution of labels p(y)p(y) changes and p(xy)p(\mathbf{x} \vert y) labels, is fixed
13/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Varying corruption intensity for dataset shift

Samples from ImageNet-C

D. Hendrycks & T. Dietterich, Benchmarking Neural Network Robustness to Common Corruptions and Perturbations, ICLR 2019

14/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Varying corruption intensity for dataset shift

Corruption types for ImageNet-C

D. Hendrycks & T. Dietterich, Benchmarking Neural Network Robustness to Common Corruptions and Perturbations, ICLR 2019

15/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Neural nets do not generalize under covariate shift






  • Accuracy drops with increasing shift on ImageNet-C



  • Uncertainty quality degrades, making overconfident errors

Y. Ovadia et al., Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift, NeurIPS 2019

16/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Neural nets assign high confidence predictions to OOD data

Example images where model assigns >99.5%{>}99.5\% confidence

A. Nguyen et al., Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images, CVPR 2015

17/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Neural nets assign high confidence predictions to OOD data

J.Z. Liu et al., Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness, arXiv 2020

18/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Calibration Error=Confidencepredicted probability of correctnessAccuracyobserved frequency of correctness\text{Calibration Error} = \vert \underbrace{\text{Confidence}}_{\text{predicted probability of correctness}} - \underbrace{\text{Accuracy}}_{\text{observed frequency of correctness}} \vert

19/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Calibration

  • Calibration: of the times your model predicts something with 90%90\% confidence, is it right 90%90\% of the time?


Calibration of weather forecasts

Nate Silver, The singal and the noise

20/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Most neural networks output probability distributions, e.g., over object categories. Are these calibrated?

21/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Measuring calibration: Expected Calibration Error


ECE=b=1bnbNacc(b)conf(b)\text{ECE} = \sum_{b=1}^{b}\frac{n_b}{N}\vert \text{acc}(b) -\text{conf}(b) \vert

  • Bin the probabilities into BB bins
  • Compute the within-bin accuracy and within-bin predicted confidence
  • Average the calibration error across bins (weighted by number of points in each bin)
22/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Calibration




  • Most neural networks output probability distributions, e.g., over object categories. Are these calibrated?

C. Guo et al., On Calibration of Modern Neural Networks, ICML 2017

23/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Why is this happening now?

The effect of network depth (far left), width (middle left), Batch Normalization (middle right), and weight decay (far right) on miscalibration, as measured by ECE (lower is better).

C. Guo et al., On Calibration of Modern Neural Networks, ICML 2017

24/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Why is this happening now?

The effect of network depth (far left), width (middle left), Batch Normalization (middle right), and weight decay (far right) on miscalibration, as measured by ECE (lower is better).

C. Guo et al., On Calibration of Modern Neural Networks, ICML 2017

We kind of got too good at training these beasts

24/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Applications

  • Autonomous vehicles: dataset shift: location, weather, time of day; use model uncertainty to decide when to trust model or hand-over to human
25/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Applications

  • Autonomous vehicles: dataset shift: location, weather, time of day; use model uncertainty to decide when to trust model or hand-over to human

  • Healthcare: model uncertainty for trusting the model or calling doctor; reject low-quality inputs

25/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Applications

  • Autonomous vehicles: dataset shift: location, weather, time of day; use model uncertainty to decide when to trust model or hand-over to human

  • Healthcare: model uncertainty for trusting the model or calling doctor; reject low-quality inputs

  • Chatbots: detect unknown sentences

25/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Applications

  • Autonomous vehicles: dataset shift: location, weather, time of day; use model uncertainty to decide when to trust model or hand-over to human

  • Healthcare: model uncertainty for trusting the model or calling doctor; reject low-quality inputs

  • Chatbots: detect unknown sentences

  • Active Learning: use model uncertainty to decide which training examples are worth labeling

25/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Applications

  • Autonomous vehicles: dataset shift: location, weather, time of day; use model uncertainty to decide when to trust model or hand-over to human

  • Healthcare: model uncertainty for trusting the model or calling doctor; reject low-quality inputs

  • Chatbots: detect unknown sentences

  • Active Learning: use model uncertainty to decide which training examples are worth labeling

  • Bayesian Optimization: optimize an expensive black-box function by finding which configurations to explore next

25/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Applications

  • Autonomous vehicles: dataset shift: location, weather, time of day; use model uncertainty to decide when to trust model or hand-over to human

  • Healthcare: model uncertainty for trusting the model or calling doctor; reject low-quality inputs

  • Chatbots: detect unknown sentences

  • Active Learning: use model uncertainty to decide which training examples are worth labeling

  • Bayesian Optimization: optimize an expensive black-box function by finding which configurations to explore next

  • Reinforcement Learning: use uncertainty for exploration vs. exploitation trade-off

25/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Sources of uncertainty

26/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

There are two main types of uncertainties each with its own pecularities

27/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Case 1

28/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Case 1

Problems caused by sensor quality, natural randomness, that cannot be explained by our data.

29/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Case 1

Problems caused by sensor quality, natural randomness, that cannot be explained by our data.

Aleatoric / Data uncertainty

29/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Case 1

Problems caused by sensor quality, natural randomness, that cannot be explained by our data.

Aleatoric / Data uncertainty

  • aleator (lat.) = dice player
29/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Case 1

Problems caused by sensor quality, natural randomness, that cannot be explained by our data.

Aleatoric / Data uncertainty

  • aleator (lat.) = dice player

  • cannot be reduced, but can be learned

  • useful for:
    • large data situation, where model uncertainty is low
    • real-time processing, cheaper to compute than model uncertainty
29/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Case 1'

Similarly looking objects also fall into this category

30/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout






Similarly looking objects also fall into this category

31/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Aleatoric uncertainty

Distinct classes

Credit: A. Malinin

32/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Aleatoric uncertainty

Distinct classes

Credit: A. Malinin

Overlapping classes

33/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

In urban scenes this type of uncertainty is frequently caused by similarly-looking classes:

  • pedestrian - cyclist - person on trottinette/scooter
  • road - sidewalk
34/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Aleatoric uncertainty

Credit: A. Malinin

35/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Aleatoric uncertainty

Credit: A. Malinin

36/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Aleatoric uncertainty

Low entropy

High entropy

Credit: A. Malinin
37/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

In layman words data uncertainty is called the: known unknown

38/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Case 2

39/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Case 2

Lack of knowledge about the process that generated the data

40/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Case 2

Lack of knowledge about the process that generated the data

Epistemic/Knowledge uncertainty

40/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Case 2

Lack of knowledge about the process that generated the data

Epistemic/Knowledge uncertainty

  • episteme (gr.) = knowledge
40/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Case 2

Lack of knowledge about the process that generated the data

Epistemic/Knowledge uncertainty

  • episteme (gr.) = knowledge

  • disappears given enough data

  • useful for:
    • detecting samples far from the training distribution
    • small datasets with little annotated data
40/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Case 2

  • Epistemic error decreases when you gather more points:


Slide credit: Marcin Mozejko

41/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Image credit: Marcin Mozejko

42/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Case 2'

Let us consider a neural network model trained with several pictures of dog breeds.

43/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Case 2'

Let us consider a neural network model trained with several pictures of dog breeds.

  • We ask the model to decide on a dog breed using a photo of a cat.
  • What would you want the model to do?
43/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Case 2'

Let us consider a neural network model trained with several pictures of dog breeds.

Out-of-distribution uncertainty

43/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

In layman words, knowledge uncertainty is called the: unknown unknown

44/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Epistemic uncertainty

Credit: A. Malinin

45/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Epistemic uncertainty

Unseen classes

Credit: A. Malinin

46/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Epistemic uncertainty

Unseen classes

Credit: A. Malinin

Unseen variations of seen classes

47/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

"Our model exhibits in (d) increased aleatoric uncertainty on object boundaries and for objects far from the camera. Epistemic uncertainty accounts for our ignorance about which model generated our collected data. In (e) our model exhibits increased epistemic uncertainty for semantically and visually challenging pixels. The bottom row shows a failure case of the segmentation model when the model fails to segment the footpath due to increased epistemic uncertainty, but not aleatoric uncertainty."

A. Kendall and Y. Gal, What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?, NeurIPS 2017.

48/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Measuring the quality of the uncertainty can be challenging due to lack of ground truth, i.e., no “right answer” in some cases

49/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

MC-Dropout

50/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Dropout

  • First "deep" regularization technique
  • Remove units at random during the forward pass on each sample
  • Put them all back during test

Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Srivastava et al., JMLR 2014

51/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Dropout

Interpretation

  • Reduces the network dependency to individual neurons and distributes representation
  • More redundant representation of data

Ensemble interpretation

  • Equivalent to training a large ensemble of shared-parameters, binary-masked models
  • Each model is only trained on a single data point
  • A network with dropout can be interpreted as an ensemble of 2N2^N models with heavy weight sharing (Goodfellow et al., 2013)
52/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Dropout

>>> x = torch.full((3, 5), 1.0).requires_grad_()
>>> x
tensor([[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]])
>>> dropout = nn.Dropout(p = 0.75)
>>> y = dropout(x)
>>> y
tensor([[ 0., 0., 4., 0., 4.],
[ 0., 4., 4., 4., 0.],
[ 0., 0., 4., 0., 0.]])
>>> l = y.norm(2, 1).sum()
>>> l.backward()
>>> x.grad
tensor([[ 0.0000, 0.0000, 2.8284, 0.0000, 2.8284]
[ 0.0000, 2.3094, 2.3094, 2.3094, 0.0000]
[ 0.0000, 0.0000, 4.0000, 0.0000, 0.0000]])
53/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Dropout

>>> x = torch.full((3, 5), 1.0).requires_grad_()
>>> x
tensor([[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]])
>>> dropout = nn.Dropout(p = 0.75)
>>> y = dropout(x)
>>> y
tensor([[ 0., 0., 4., 0., 4.],
[ 0., 4., 4., 4., 0.],
[ 0., 0., 4., 0., 0.]])
>>> l = y.norm(2, 1).sum()
>>> l.backward()
>>> x.grad
tensor([[ 0.0000, 0.0000, 2.8284, 0.0000, 2.8284]
[ 0.0000, 2.3094, 2.3094, 2.3094, 0.0000]
[ 0.0000, 0.0000, 4.0000, 0.0000, 0.0000]])
54/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Dropout

For a given network

model = nn.Sequential(nn.Linear(10, 100), nn.ReLU(),
nn.Linear(100, 50), nn.ReLU(),
nn.Linear(50, 2));
55/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Dropout

For a given network

model = nn.Sequential(nn.Linear(10, 100), nn.ReLU(),
nn.Linear(100, 50), nn.ReLU(),
nn.Linear(50, 2));
we can simply add dropout layers
model = nn.Sequential(nn.Linear(10, 100), nn.ReLU(),
nn.Dropout(),
nn.Linear(100, 50), nn.ReLU(),
nn.Dropout(),
nn.Linear(50, 2));
56/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Dropout

A model using dropout has to be set in train or test mode

57/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Dropout

A model using dropout has to be set in train or test mode

The method nn.Module.train(mode) recursively sets the flag training to all sub-modules.

>>> dropout = nn.Dropout()
>>> model = nn.Sequential(nn.Linear(3, 10), dropout, nn.Linear(10, 3))
>>> dropout.training
True
>>> model.train(False)
Sequential (
(0): Linear (3 -> 10) (1): Dropout (p = 0.5) (2): Linear (10 -> 3)
)
>>> dropout.training
False
58/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Dropout

A model using dropout has to be set in train or test mode

>>> dropout = nn.Dropout()
>>> model = nn.Sequential(nn.Linear(3, 10), dropout, nn.Linear(10, 3))
>>> x = torch.full((1, 3), 1.0)
>>> model.train()
Sequential (
(0): Linear (3 -> 10) (1): Dropout (p = 0.5) (2): Linear (10 -> 3)
)
>>> model(x)
tensor([[ 0.5360, -0.5225, -0.5129]], grad_fn=<ThAddmmBackward>)
>>> model(x)
tensor([[ 0.6134, -0.6130, -0.5161]], grad_fn=<ThAddmmBackward>)
59/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Dropout

A model using dropout has to be set in train or test mode

>>> dropout = nn.Dropout()
>>> model = nn.Sequential(nn.Linear(3, 10), dropout, nn.Linear(10, 3))
>>> x = torch.full((1, 3), 1.0)
>>> model.train()
Sequential (
(0): Linear (3 -> 10) (1): Dropout (p = 0.5) (2): Linear (10 -> 3)
)
>>> model(x)
tensor([[ 0.5360, -0.5225, -0.5129]], grad_fn=<ThAddmmBackward>)
>>> model(x)
tensor([[ 0.6134, -0.6130, -0.5161]], grad_fn=<ThAddmmBackward>)
>>>
>>> model.eval()
Sequential (
(0): Linear (3 -> 10) (1): Dropout (p = 0.5) (2): Linear (10 -> 3)
)
>>> model(x)
tensor([[ 0.5772, -0.0944, -0.1168]], grad_fn=<ThAddmmBackward>)
>>> model(x)
tensor([[ 0.5772, -0.0944, -0.1168]], grad_fn=<ThAddmmBackward>)
60/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

How can we get uncertainties from standard networks?

Standard Neural Network

Bayesian Neural Network

Dropout as Bayesian approximation: representing model uncertainty in deep learning, Y. Gal, ICML 2016

61/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Standard Neural Network

Bayesian Neural Network

Image credit: Eric Ma

62/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

From Bayesian Neural Networks to Dropout

Gal and Ghahramani build upon the ensembling view of Dropout and show that when training a network with dropout with a standard classification or regression objective, one is actually implicitly doing variational inference to match the posterior distribution of the weights.

63/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Uncertainty estimates from dropout

Proper epistemic uncertainty estimates at x\mathbf{x} can be obtained in a principled way using Monte-Carlo integration:

  • Draw TT sets of network parameters θ^t\hat{\theta}_t from q(θ;ν)q(\theta;\nu).
  • Compute the predictions for the TT networks, {f(x;θ^t)}t=1T\{ f(\mathbf{x};\hat{\theta}_t) \}_{t=1}^T.
  • Approximate the predictive mean and variance as follows: Ep(yx,X,Y)[y]1Tt=1Tf(x;θ^t)Vp(yx,X,Y)[y]σ2+1Tt=1Tf(x;θ^t)2E^[y]2 \begin{aligned} \mathbb{E}_{p(y|\mathbf{x},\mathbf{X},\mathbf{Y})}\left[y\right] &amp;\approx \frac{1}{T} \sum_{t=1}^T f(\mathbf{x};\hat{\theta}_t) \\ \mathbb{V}_{p(y|\mathbf{x},\mathbf{X},\mathbf{Y})}\left[y\right] &amp;\approx \sigma^2 + \frac{1}{T} \sum_{t=1}^T f(\mathbf{x};\hat{\theta}_t)^2 - \hat{\mathbb{E}}\left[y\right]^2 \end{aligned}
64/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Yarin Gal's demo.

65/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Uncertainty estimates from dropout

class SimpleModel(nn.Module):
def __init__(self, p, decay):
super(SimpleModel, self).__init__()
self.dropout_p = p
self.decay = decay
self.f = nn.Sequential(
nn.Linear(1,20),
nn.ReLU(),
nn.Dropout(p=self.dropout_p),
nn.Linear(20, 20),
nn.ReLU(),
nn.Dropout(p=self.dropout_p),
nn.Linear(20,1)
)
def forward(self, X):
return self.f(X)
def uncertainty_estimate(X, model, iters=200, l2=0.01):
model.train()
outputs = np.hstack([model(X[:, np.newaxis]).data.numpy() \
for i in range(iters)])
y_mean = outputs.mean(axis=1)
y_variance = outputs.var(axis=1)
tau = l2 * (1. - model.dropout_p) / (2. * N * model.decay)
y_variance += (1. / tau)
y_std = np.sqrt(y_variance)
return y_mean, y_std
66/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Results

Y. Gal, Dropout as Bayesian approximation: representing model uncertainty in deep learning, ICML 2016

67/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Results

Y. Gal, Dropout as Bayesian approximation: representing model uncertainty in deep learning, ICML 2016

68/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Pixel-wise depth regression

A. Kendall and Y. Gal, What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?, NeurIPS 2017.

69/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Combinining heteroscedastic and epistemic uncertainty

Semantic Segmentation performance on CamVid

What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?, A. Kendall and Y. Gal, NeurIPS 2017

70/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Combinining heteroscedastic and epistemic uncertainty

Monocular Depth Regression Performance

What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?, A. Kendall and Y. Gal, NeurIPS 2017

71/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Comparing heteroscedastic and epistemic uncertainty

Aleatoric vs. Epistemic Uncertainty for Out of Dataset Examples

  • Aleatoric uncertainty remains constant while epistemic uncertainty increases for out of dataset examples!

What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?, A. Kendall and Y. Gal, NeurIPS 2017

72/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Applications

Multiple follow-up papers by Gal and friends:

  • Concrete Dropout : learn Dropout probability for each layer using Concrete/ Gumble-Softmax trick
  • Active Learning with MC Dropout: select samples using uncertainty
  • MC Dropout for RNNs: same dropout mask across time-steps
  • Data efficiency in RL
  • Stochasticity via BatchNorm perturbation
73/73       

Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout

Towards deep learning for the real world

i.e., beyond cats and dogs


Andrei Bursuc

2/73       

Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow