layout: true .center.footer[Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 15.2 Uncertainty estimation - MCDropout] --- class: center, middle # 15.2 Towards deep learning for the real world .hidden[ ## .italic[i.e.], beyond cats and dogs ]
.center.big.bold[Andrei Bursuc] --- class: center, middle # Towards deep learning for the real world ## .italic[i.e.], beyond cats and dogs
.center.big.bold[Andrei Bursuc] --- class: middle, center # Motivation --- class: middle .grid[ .kol-6-12[ .big.green[Deep Learning is great:] - conceptually simple and modular - scales well with data - awesome software tools - huge community and interest - potentially real world impact ] ] --- class: middle count: false .grid[ .kol-6-12[ .big.green[Deep Learning is great:] - conceptually simple and modular - scales well with data - awesome software tools - huge community and interest - potentially real world impact ] .kol-6-12[ .big.red[ ... but has several problems] - uninterpretable black-boxes - needs a lot of data - mostly empirical - what does a model not know? - can be fooled easily ] ] --- class: middle, center ## The world is a complex environment .center.width-80[data:image/s3,"s3://crabby-images/90e7c/90e7c4cbb641cd36e341ba6e77802730ef60d6c4" alt=""] .center[Covering this diversity with (sufficient) data and labels is highly challenging] --- class: middle, center ## Dealing with uncertainty .center.width-40[data:image/s3,"s3://crabby-images/6155d/6155d5d3544a4bce86680a4fc7bef288f2244f67" alt=""] --- class: middle, center .bigg[Why should I care about uncertainty?] --- # Motivation In May 2016, there was the **first fatality** from an assisted driving system, caused by the perception system confusing the white side of a trailer for bright sky. .left-column[.center.width-100[data:image/s3,"s3://crabby-images/7a88e/7a88e75448cae404f0520c0dcf9366a9f427d751" alt=""]] .right-column[.center.width-60[data:image/s3,"s3://crabby-images/db031/db031953c5a1feb0edf50d7a6c1bdadf35445634" alt=""]] --- # Motivation .center.width-50[data:image/s3,"s3://crabby-images/0cb5d/0cb5dcfc3e0183e50e3703587fa6faad34474a51" alt=""] An image classification system erroneously identifies two African Americans as gorillas, raising concerns of racial discrimination. --- class: middle, center .bigg[What do we mean by uncertainty?] --- class: middle ## What do we mean by uncertainty? .grid[ .kol-8-12[ .bigger[ Return a distribution over predictions instead of a single prediction: - *classification*: output a label and its confidence - *regression*: output a mean and a variance ] ] .kol-4-12[ .center.width-90[data:image/s3,"s3://crabby-images/fc22d/fc22d2dee41191d4e398c1be823af21b9b3e93fc" alt=""] ] ] --- class: middle, center .big[Good uncertainty estimates tell us *when we can trust the predictions of our model*.] --- class: middle ## What do we mean by Out-of-Distribution Robustness? .grid[ .kol-6-12[ .big[ **I.I.D**] $\text{ }p\_{\text{test}}(\mathbf{x}, y) = p\_{\text{train}}(\mathbf{x}, y)$ (I.I.D. = Indepedent and Identically Distributed) ] .kol-6-12[ .big[ **O.O.D**] $\text{ }p\_{\text{test}}(\mathbf{x}, y) =\not\ p\_{\text{train}}(\mathbf{x}, y)$ ] ] .hidden[ Examples of dataset shift: - *covariate shift*: distribution of features $p\(\mathbf{x})$ changes and $p\(y \vert \mathbf{x})$, i.e., labels, is fixed - *open-set recognition*: new classes may appear at test time - *label shift*: distribution of labels $p\(y)$ changes and $p\(\mathbf{x} \vert y)$ labels, is fixed ] --- count: false class: middle ## What do we mean by Out-of-Distribution Robustness? .grid[ .kol-6-12[ .big[ **I.I.D**] $\text{ }p\_{\text{test}}(\mathbf{x}, y) = p\_{\text{train}}(\mathbf{x}, y)$ (I.I.D. = Indepedent and Identically Distributed) ] .kol-6-12[ .big[ **O.O.D**] $\text{ }p\_{\text{test}}(\mathbf{x}, y) =\not\ p\_{\text{train}}(\mathbf{x}, y)$ ] ] Examples of dataset shift: - *covariate shift*: distribution of features $p\(\mathbf{x})$ changes and $p\(y \vert \mathbf{x})$, i.e., labels, is fixed - *open-set recognition*: new classes may appear at test time - *label shift*: distribution of labels $p\(y)$ changes and $p\(\mathbf{x} \vert y)$ labels, is fixed --- class: middle ## Varying corruption intensity for dataset shift .center.width-100[data:image/s3,"s3://crabby-images/15deb/15debc75e956e78550cdaf2756e043f1e425b830" alt=""] .caption[Samples from ImageNet-C] .citation[D. Hendrycks & T. Dietterich, Benchmarking Neural Network Robustness to Common Corruptions and Perturbations, ICLR 2019 ] --- class: middle ## Varying corruption intensity for dataset shift .center.width-60[data:image/s3,"s3://crabby-images/09450/094506e26b893a763711e8c718adb9f20d10fe42" alt=""] .caption[Corruption types for ImageNet-C] .citation[D. Hendrycks & T. Dietterich, Benchmarking Neural Network Robustness to Common Corruptions and Perturbations, ICLR 2019 ] --- class: middle ## Neural nets do not generalize under covariate shift .grid[ .kol-4-12[
- **Accuracy drops** with increasing shift on ImageNet-C
- **Uncertainty quality degrades**, making overconfident errors ] .kol-8-12[ .center.width-80[data:image/s3,"s3://crabby-images/959fd/959fdb216988103458897d861bb92537df9a2e49" alt=""] ] ] .citation[Y. Ovadia et al., Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift, NeurIPS 2019 ] --- class: middle ## Neural nets assign high confidence predictions to OOD data .center.width-100[data:image/s3,"s3://crabby-images/6d4f4/6d4f437b2f485bbefdeeb7a935dbab95bb67af8e" alt=""] .caption[Example images where model assigns ${>}99.5\%$ confidence] .citation[A. Nguyen et al., Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images, CVPR 2015 ] --- class: middle ## Neural nets assign high confidence predictions to OOD data .center.width-80[data:image/s3,"s3://crabby-images/6d65e/6d65edb119297155a4281fa41f30ff914e087746" alt=""] .citation[J.Z. Liu et al., Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness, arXiv 2020 ] --- class: middle, center .bigger[ $$\text{Calibration Error} = \vert \underbrace{\text{Confidence}}\_{\text{predicted probability of correctness}} - \underbrace{\text{Accuracy}}\_{\text{observed frequency of correctness}} \vert $$ ] --- class: middle ## Calibration - _Calibration_: of the times your model predicts something with $90\%$ confidence, is it right $90\%$ of the time?
.center.width-80[data:image/s3,"s3://crabby-images/9de43/9de437acf9abac9b812be6043c0d474edfb51dea" alt=""] .caption[Calibration of weather forecasts] .citation[Nate Silver, The singal and the noise] --- class: middle .big[Most neural networks output probability distributions, e.g., over object categories. Are these calibrated?] --- class: middle ## Measuring calibration: Expected Calibration Error .grid[ .kol-6-12[
.bigger[ $$\text{ECE} = \sum\_{b=1}^{b}\frac{n\_b}{N}\vert \text{acc}(b) -\text{conf}(b) \vert$$ ] ] .kol-6-12[ - Bin the probabilities into $B$ bins - Compute the within-bin accuracy and within-bin predicted confidence - Average the calibration error across bins (weighted by number of points in each bin) ] ] --- class: middle ## Calibration .grid[ .kol-6-12[
- Most neural networks output probability distributions, e.g., over object categories. Are these calibrated? ] .kol-6-12[ .center.width-90[data:image/s3,"s3://crabby-images/582e4/582e4b4923014e2ba7eb5a3844f0179f0f4773d2" alt=""] ] ] .citation[C. Guo et al., On Calibration of Modern Neural Networks, ICML 2017] --- ## Why is this happening now? .center.width-100[data:image/s3,"s3://crabby-images/702a5/702a5569e162e310142fe694bb85ba1ecb47ff42" alt=""] .caption[The effect of network depth (far left), width (middle left), Batch Normalization (middle right), and weight decay (far right) on miscalibration, as measured by ECE (lower is better).] .citation[C. Guo et al., On Calibration of Modern Neural Networks, ICML 2017] -- count: false .center.big.red[We kind of got too good at training these beasts] --- class: middle ## Applications - **Autonomous vehicles**: dataset shift: location, weather, time of day; use model uncertainty to decide when to trust model or hand-over to human .hidden[ - **Healthcare**: model uncertainty for trusting the model or calling doctor; reject low-quality inputs - **Chatbots**: detect unknown sentences - **Active Learning**: use model uncertainty to decide which training examples are worth labeling - **Bayesian Optimization**: optimize an expensive black-box function by finding which configurations to explore next - **Reinforcement Learning**: use uncertainty for exploration vs. exploitation trade-off ] --- count: false class: middle ## Applications - **Autonomous vehicles**: dataset shift: location, weather, time of day; use model uncertainty to decide when to trust model or hand-over to human - **Healthcare**: model uncertainty for trusting the model or calling doctor; reject low-quality inputs .hidden[ - **Chatbots**: detect unknown sentences - **Active Learning**: use model uncertainty to decide which training examples are worth labeling - **Bayesian Optimization**: optimize an expensive black-box function by finding which configurations to explore next - **Reinforcement Learning**: use uncertainty for exploration vs. exploitation trade-off ] --- count: false class: middle ## Applications - **Autonomous vehicles**: dataset shift: location, weather, time of day; use model uncertainty to decide when to trust model or hand-over to human - **Healthcare**: model uncertainty for trusting the model or calling doctor; reject low-quality inputs - **Chatbots**: detect unknown sentences .hidden[ - **Active Learning**: use model uncertainty to decide which training examples are worth labeling - **Bayesian Optimization**: optimize an expensive black-box function by finding which configurations to explore next - **Reinforcement Learning**: use uncertainty for exploration vs. exploitation trade-off ] --- count: false class: middle ## Applications - **Autonomous vehicles**: dataset shift: location, weather, time of day; use model uncertainty to decide when to trust model or hand-over to human - **Healthcare**: model uncertainty for trusting the model or calling doctor; reject low-quality inputs - **Chatbots**: detect unknown sentences - **Active Learning**: use model uncertainty to decide which training examples are worth labeling .hidden[ - **Bayesian Optimization**: optimize an expensive black-box function by finding which configurations to explore next - **Reinforcement Learning**: use uncertainty for exploration vs. exploitation trade-off ] --- count: false class: middle ## Applications - **Autonomous vehicles**: dataset shift: location, weather, time of day; use model uncertainty to decide when to trust model or hand-over to human - **Healthcare**: model uncertainty for trusting the model or calling doctor; reject low-quality inputs - **Chatbots**: detect unknown sentences - **Active Learning**: use model uncertainty to decide which training examples are worth labeling - **Bayesian Optimization**: optimize an expensive black-box function by finding which configurations to explore next .hidden[ - **Reinforcement Learning**: use uncertainty for exploration vs. exploitation trade-off ] --- count: false class: middle ## Applications - **Autonomous vehicles**: dataset shift: location, weather, time of day; use model uncertainty to decide when to trust model or hand-over to human - **Healthcare**: model uncertainty for trusting the model or calling doctor; reject low-quality inputs - **Chatbots**: detect unknown sentences - **Active Learning**: use model uncertainty to decide which training examples are worth labeling - **Bayesian Optimization**: optimize an expensive black-box function by finding which configurations to explore next - **Reinforcement Learning**: use uncertainty for exploration vs. exploitation trade-off --- class: center, middle # Sources of uncertainty --- class: middle, center .big[There are two main types of uncertainties each with its own pecularities] --- class: middle, center # Case 1 --- # Case 1 Problems caused by sensor quality, natural randomness, that cannot be explained by our data. .center.width-90[data:image/s3,"s3://crabby-images/c3ad9/c3ad91f5bdcc5431935e4caefcf19e18ebe825d8" alt=""] -- count: false .center.big[__Aleatoric / Data uncertainty__] -- count: false - _.italic[aleator]_ (lat.) = dice player -- count: false - cannot be reduced, but can be learned - useful for: + large data situation, where model uncertainty is low + real-time processing, cheaper to compute than model uncertainty --- # Case 1' Similarly looking objects also fall into this category .grid[ .kol-6-12[ .center.width-80[data:image/s3,"s3://crabby-images/0c5f7/0c5f79d7304052bdeb59b6ce48652ff74313f758" alt=""] ] .kol-6-12[ .center.width-80[data:image/s3,"s3://crabby-images/95da1/95da1d4e62431d33c134a8c4f83ac2a83bc62a17" alt=""] ] ] --- class: middle .grid[ .kol-4-12[
Similarly looking objects also fall into this category ] .kol-8-12[ .center.width-60[data:image/s3,"s3://crabby-images/09775/09775ad63d9bdc0a8f8c20391d1517bb36dbba0d" alt=""] ] ] --- # Aleatoric uncertainty ### Distinct classes .center.width-40[data:image/s3,"s3://crabby-images/29e4f/29e4f4e43f70ebc05dc31681f5541232cc06dab0" alt=""] .credit[Credit: A. Malinin] -- ### Overlapping classes .center.width-40[data:image/s3,"s3://crabby-images/833b6/833b6019ae127b3dbd8d6bfc0eeafe69cfbb8564" alt=""] --- class: middle .big[In urban scenes this type of uncertainty is frequently caused by similarly-looking classes: - .italic[pedestrian - cyclist - person on trottinette/scooter] - .italic[road - sidewalk] ] --- # Aleatoric uncertainty .center.width-70[data:image/s3,"s3://crabby-images/7b0b4/7b0b44698ef3256f4f4e8e7be438068aae9ca0ec" alt=""] .credit[Credit: A. Malinin] --- # Aleatoric uncertainty .center.width-70[data:image/s3,"s3://crabby-images/7c3d3/7c3d3eb3fd0b0f00be85b702e85813b666350abc" alt=""] .credit[Credit: A. Malinin] --- # Aleatoric uncertainty .center.width-100[data:image/s3,"s3://crabby-images/43160/43160948826e3fa7d7cdeb5f357510c922a78734" alt=""] .grid[ .kol-6-12[ .center.big[Low entropy] ] .kol-6-12[ .center.big[High entropy] ] ] .credit[Credit: A. Malinin] --- class: middle .big[In layman words data uncertainty is called the: __known unknown__] --- class: middle, center # Case 2 --- # Case 2 Lack of knowledge about the process that generated the data .center.width-90[data:image/s3,"s3://crabby-images/09958/09958e068f9540b36796ead9bc107e68fe1b7df7" alt=""] -- count: false .center.big[__Epistemic/Knowledge uncertainty__] -- count: false - _.italic[episteme]_ (gr.) = knowledge -- count: false - disappears given enough data - useful for: + detecting samples far from the training distribution + small datasets with little annotated data --- # Case 2 - Epistemic error decreases when you gather more points:
.center.width-60[data:image/s3,"s3://crabby-images/60bd0/60bd0d1fee5fc5b991ed1654b8ebbcdf0d68ca49" alt=""] .credit[Slide credit: Marcin Mozejko] --- class: middle .center.width-50[data:image/s3,"s3://crabby-images/ac678/ac678c3850c5aecf3c44c980d647eb2950f9bae7" alt=""] .credit[Image credit: Marcin Mozejko] --- # Case 2' Let us consider a neural network model trained with several pictures of dog breeds. .left-column[ .center.width-70[data:image/s3,"s3://crabby-images/bd6ad/bd6ad3e92dde5119dcc479fc60d73c48686e009c" alt=""] ] .right-column[ ] --- count: false # Case 2' Let us consider a neural network model trained with several pictures of dog breeds. .left-column[ .center.width-70[data:image/s3,"s3://crabby-images/bd6ad/bd6ad3e92dde5119dcc479fc60d73c48686e009c" alt=""] ] .right-column[ .center.width-70[data:image/s3,"s3://crabby-images/adb58/adb58b62f9e640d2cc5ec11d35e9eeb7c2fe093a" alt=""] ] .reset-column[ ] - We ask the model to decide on a dog breed using a photo of a cat. - What would you want the model to do? --- count: false # Case 2' Let us consider a neural network model trained with several pictures of dog breeds. .left-column[ .center.width-70[data:image/s3,"s3://crabby-images/bd6ad/bd6ad3e92dde5119dcc479fc60d73c48686e009c" alt=""] ] .right-column[ .center.width-70[data:image/s3,"s3://crabby-images/adb58/adb58b62f9e640d2cc5ec11d35e9eeb7c2fe093a" alt=""] ] .reset-column[ ] .center.big[__Out-of-distribution uncertainty__] --- class: middle .big[In layman words, knowledge uncertainty is called the: __unknown unknown__] --- # Epistemic uncertainty .center.width-70[data:image/s3,"s3://crabby-images/95cc8/95cc8fa45e9a91bcb71fd6687efee70872ddfd9e" alt=""] .credit[Credit: A. Malinin] --- # Epistemic uncertainty ### Unseen classes .center.width-60[data:image/s3,"s3://crabby-images/412a9/412a9b86181cdfb15fd8732af32e08ebcb4b0f33" alt=""] .credit[Credit: A. Malinin] -- ### Unseen variations of seen classes .center.width-60[data:image/s3,"s3://crabby-images/d0af3/d0af376dcea35c34a8be08ad36a9f2830504aac1" alt=""] --- class: middle .center.width-70[data:image/s3,"s3://crabby-images/f75a0/f75a090db9a3f207fb7969d35e1ac06b78f03b01" alt=""] .caption["Our model exhibits in (d) increased .bold[aleatoric uncertainty on object boundaries and for objects far from the camera]. .bold[Epistemic uncertainty accounts for our ignorance about which model generated our collected data]. In (e) our model exhibits increased epistemic uncertainty for semantically and visually challenging pixels. The bottom row shows a failure case of the segmentation model when the model fails to segment the footpath due to increased epistemic uncertainty, but not aleatoric uncertainty."] .citation[A. Kendall and Y. Gal, What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?, NeurIPS 2017.] --- class: middle, center .bigger[Measuring the quality of the uncertainty can be challenging due to **lack of ground truth**, i.e., no “right answer” in some cases ] --- class: middle, center # MC-Dropout --- # Dropout - First "deep" regularization technique - Remove units at random during the forward pass on each sample - Put them all back during test .center.width-60[data:image/s3,"s3://crabby-images/acb2e/acb2e1a32bad2f01e3e47cfe317e42ae4fffde66" alt=""] .citation.tiny[Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Srivastava et al., JMLR 2014] --- # Dropout ## Interpretation - Reduces the network dependency to individual neurons and distributes representation - More redundant representation of data ## Ensemble interpretation - Equivalent to training a large ensemble of shared-parameters, binary-masked models - Each model is only trained on a single data point - __A network with dropout can be interpreted as an ensemble of $2^N$ models with heavy weight sharing__ (Goodfellow et al., 2013) --- # Dropout .grid[ .kol-1-12[ ] .kol-10-12[ ```py >>> x = torch.full((3, 5), 1.0).requires_grad_() >>> x tensor([[ 1., 1., 1., 1., 1.], [ 1., 1., 1., 1., 1.], [ 1., 1., 1., 1., 1.]]) >>> dropout = nn.Dropout(p = 0.75) >>> y = dropout(x) >>> y tensor([[ 0., 0., 4., 0., 4.], [ 0., 4., 4., 4., 0.], [ 0., 0., 4., 0., 0.]]) >>> l = y.norm(2, 1).sum() >>> l.backward() >>> x.grad tensor([[ 0.0000, 0.0000, 2.8284, 0.0000, 2.8284] [ 0.0000, 2.3094, 2.3094, 2.3094, 0.0000] [ 0.0000, 0.0000, 4.0000, 0.0000, 0.0000]]) ``` ] ] --- # Dropout .grid[ .kol-1-12[ ] .kol-10-12[ ```py >>> x = torch.full((3, 5), 1.0).requires_grad_() >>> x tensor([[ 1., 1., 1., 1., 1.], [ 1., 1., 1., 1., 1.], [ 1., 1., 1., 1., 1.]]) >>> dropout = nn.Dropout(p = 0.75) >>> y = dropout(x) *>>> y *tensor([[ 0., 0., 4., 0., 4.], * [ 0., 4., 4., 4., 0.], * [ 0., 0., 4., 0., 0.]]) >>> l = y.norm(2, 1).sum() >>> l.backward() >>> x.grad tensor([[ 0.0000, 0.0000, 2.8284, 0.0000, 2.8284] [ 0.0000, 2.3094, 2.3094, 2.3094, 0.0000] [ 0.0000, 0.0000, 4.0000, 0.0000, 0.0000]]) ``` ] ] --- # Dropout For a given network .grid[ .kol-1-12[ ] .kol-10-12[ ```py model = nn.Sequential(nn.Linear(10, 100), nn.ReLU(), nn.Linear(100, 50), nn.ReLU(), nn.Linear(50, 2)); ``` ] ] -- we can simply add dropout layers .grid[ .kol-1-12[ ] .kol-10-12[ ```py model = nn.Sequential(nn.Linear(10, 100), nn.ReLU(), * nn.Dropout(), nn.Linear(100, 50), nn.ReLU(), * nn.Dropout(), nn.Linear(50, 2)); ``` ] ] --- # Dropout A model using dropout has to be set in __train__ or __test__ mode --- # Dropout A model using dropout has to be set in __train__ or __test__ mode The method `nn.Module.train(mode)` recursively sets the flag `training` to all sub-modules. .grid[ .kol-1-12[ ] .kol-10-12[ ```py >>> dropout = nn.Dropout() >>> model = nn.Sequential(nn.Linear(3, 10), dropout, nn.Linear(10, 3)) >>> dropout.training True >>> model.train(False) Sequential ( (0): Linear (3 -> 10) (1): Dropout (p = 0.5) (2): Linear (10 -> 3) ) >>> dropout.training False ``` ] ] --- # Dropout A model using dropout has to be set in __train__ or __test__ mode .grid[ .kol-1-12[ ] .kol-10-12[ ```py >>> dropout = nn.Dropout() >>> model = nn.Sequential(nn.Linear(3, 10), dropout, nn.Linear(10, 3)) >>> x = torch.full((1, 3), 1.0) *>>> model.train() Sequential ( (0): Linear (3 -> 10) (1): Dropout (p = 0.5) (2): Linear (10 -> 3) ) >>> model(x) *tensor([[ 0.5360, -0.5225, -0.5129]], grad_fn=
) >>> model(x) *tensor([[ 0.6134, -0.6130, -0.5161]], grad_fn=
) ``` ] ] --- # Dropout A model using dropout has to be set in __train__ or __test__ mode .grid[ .kol-1-12[ ] .kol-10-12[ ```py >>> dropout = nn.Dropout() >>> model = nn.Sequential(nn.Linear(3, 10), dropout, nn.Linear(10, 3)) >>> x = torch.full((1, 3), 1.0) >>> model.train() Sequential ( (0): Linear (3 -> 10) (1): Dropout (p = 0.5) (2): Linear (10 -> 3) ) >>> model(x) tensor([[ 0.5360, -0.5225, -0.5129]], grad_fn=
) >>> model(x) tensor([[ 0.6134, -0.6130, -0.5161]], grad_fn=
) >>> *>>> model.eval() Sequential ( (0): Linear (3 -> 10) (1): Dropout (p = 0.5) (2): Linear (10 -> 3) ) >>> model(x) *tensor([[ 0.5772, -0.0944, -0.1168]], grad_fn=
) >>> model(x) *tensor([[ 0.5772, -0.0944, -0.1168]], grad_fn=
) ``` ] ] --- class: middle ## How can we get uncertainties from standard networks? .left-column[ ### .center[Standard Neural Network] .center.width-80[data:image/s3,"s3://crabby-images/fc16b/fc16bb755cc48814bc1b161a728dd78b1edf1643" alt=""] ] .right-column[ ### .center[Bayesian Neural Network] .center.width-80[data:image/s3,"s3://crabby-images/ffc9f/ffc9f196dcb758d08118f29b05603631c4f4a7f2" alt=""] ] .reset-column[ ] .citation[Dropout as Bayesian approximation: representing model uncertainty in deep learning, Y. Gal, ICML 2016] --- class: middle .grid[ .kol-1-2[ ### .center[Standard Neural Network] .center.width-100[data:image/s3,"s3://crabby-images/18844/18844f1d0c1389c2fc49887e3c2b593fb37c43a5" alt=""] ] .kol-1-2[ ### .center[Bayesian Neural Network] .center.width-100[data:image/s3,"s3://crabby-images/31f2e/31f2e700375e10b3c47ea2a39d6f9c6ed3dcd590" alt=""] ] ] .credit[Image credit: Eric Ma] --- class: middle ## From Bayesian Neural Networks to Dropout Gal and Ghahramani build upon the ensembling view of Dropout and show that when **training a network with dropout** with a standard classification or regression objective, one *is actually implicitly doing variational inference* to match the posterior distribution of the weights. .center.width-40[data:image/s3,"s3://crabby-images/36c73/36c73672cd3f65e795bb59ea3282b9048746f90d" alt=""] --- class: middle ## Uncertainty estimates from dropout Proper epistemic uncertainty estimates at $\mathbf{x}$ can be obtained in a principled way using Monte-Carlo integration: - Draw $T$ sets of network parameters $\hat{\theta}\_t$ from $q(\theta;\nu)$. - Compute the predictions for the $T$ networks, $\\{ f(\mathbf{x};\hat{\theta}\_t) \\}\_{t=1}^T$. - Approximate the predictive mean and variance as follows: $$ \begin{aligned} \mathbb{E}\_{p(y|\mathbf{x},\mathbf{X},\mathbf{Y})}\left[y\right] &\approx \frac{1}{T} \sum\_{t=1}^T f(\mathbf{x};\hat{\theta}\_t) \\\\ \mathbb{V}\_{p(y|\mathbf{x},\mathbf{X},\mathbf{Y})}\left[y\right] &\approx \sigma^2 + \frac{1}{T} \sum\_{t=1}^T f(\mathbf{x};\hat{\theta}\_t)^2 - \hat{\mathbb{E}}\left[y\right]^2 \end{aligned} $$ --- class: middle, center .center.width-60[data:image/s3,"s3://crabby-images/cba73/cba7397b508a08f1b5d769f9d6811932d2c9d266" alt=""] Yarin Gal's [demo](http://mlg.eng.cam.ac.uk/yarin/blog_3d801aa532c1ce.html). --- class: middle ## Uncertainty estimates from dropout .grid[ .kol-6-12[ ```line-numbers class SimpleModel(nn.Module): def __init__(self, p, decay): super(SimpleModel, self).__init__() self.dropout_p = p self.decay = decay self.f = nn.Sequential( nn.Linear(1,20), nn.ReLU(), nn.Dropout(p=self.dropout_p), nn.Linear(20, 20), nn.ReLU(), nn.Dropout(p=self.dropout_p), nn.Linear(20,1) ) def forward(self, X): return self.f(X) ``` ] .kol-6-12[ ```line-numbers def uncertainty_estimate(X, model, iters=200, l2=0.01): model.train() outputs = np.hstack([model(X[:, np.newaxis]).data.numpy() \ for i in range(iters)]) y_mean = outputs.mean(axis=1) y_variance = outputs.var(axis=1) tau = l2 * (1. - model.dropout_p) / (2. * N * model.decay) y_variance += (1. / tau) y_std = np.sqrt(y_variance) return y_mean, y_std ``` ] ] --- class: middle ## Results .center.width-80[data:image/s3,"s3://crabby-images/175dc/175dca721f5f83d38c39f6a720b26a34adcfbb2f" alt=""] .citation[Y. Gal, Dropout as Bayesian approximation: representing model uncertainty in deep learning, ICML 2016] --- class: middle ## Results .center.width-80[data:image/s3,"s3://crabby-images/37cef/37cefd85f5e195c8a2d73eaac2b8be9228062eb6" alt=""] .citation[Y. Gal, Dropout as Bayesian approximation: representing model uncertainty in deep learning, ICML 2016] --- class: middle ## Pixel-wise depth regression .center.width-55[data:image/s3,"s3://crabby-images/63059/63059a34c4a75e49d68a8437428699d659a67cbf" alt=""] .citation[A. Kendall and Y. Gal, What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?, NeurIPS 2017.] --- class: middle ## Combinining heteroscedastic and epistemic uncertainty .center.width-80[data:image/s3,"s3://crabby-images/4f308/4f3083534cc0c68def88512820661e5a7a2b4e09" alt=""] .caption[Semantic Segmentation performance on CamVid] .citation[What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?, A. Kendall and Y. Gal, NeurIPS 2017] --- class: middle ## Combinining heteroscedastic and epistemic uncertainty .center[Monocular Depth Regression Performance] .center.width-80[data:image/s3,"s3://crabby-images/68ce7/68ce7ed9e192d77f3ab2a7edd6aecdc688c406f1" alt=""] .citation[What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?, A. Kendall and Y. Gal, NeurIPS 2017] --- class: middle ## Comparing heteroscedastic and epistemic uncertainty .center[Aleatoric vs. Epistemic Uncertainty for Out of Dataset Examples] .center.width-60[data:image/s3,"s3://crabby-images/000e2/000e2e6e09c03c47082f3139e29a1bc8b6bb7ba1" alt=""] - Aleatoric uncertainty remains constant while epistemic uncertainty increases for out of dataset examples! .citation[What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?, A. Kendall and Y. Gal, NeurIPS 2017] --- class: middle ## Applications Multiple follow-up papers by Gal and friends: - __Concrete Dropout__ : learn Dropout probability for each layer using Concrete/ Gumble-Softmax trick - __Active Learning with MC Dropout__: select samples using uncertainty - __MC Dropout for RNNs__: same dropout mask across time-steps - __Data efficiency in RL__ - Stochasticity via __BatchNorm__ perturbation