class: center, middle ### Deep Learning - MAP583 2019-2020 ## Part 6: Under the hood .bold[Andrei Bursuc ]
.width-10[![](images/logo_valeo.png)] url: https://abursuc.github.io/slides/polytechnique/06_under_hood.html .citation[ With slides from A. Karpathy, F. Fleuret, J. Johnson, S. Yeung, G. Louppe, Y. Avrithis ...] --- class: center, middle # GPUs .center[
] --- # CPU vs GPU
.left[ .center[CPU]
.center[
] ] .right[ .center[GPU]
.center[
] ] --- # CPU vs GPU .left-column[ - CPU: + fewer cores; each core is faster and more powerful + useful for sequential tasks ] .right-column[ - GPU: + more cores; each core is slower and weaker + great for parallel tasks ] .reset-column[] .center[
] --- # CPU vs GPU .left-column[ - CPU: + fewer cores; each core is faster and more powerful + useful for sequential tasks ] .right-column[ - GPU: + more cores; each core is slower and weaker + great for parallel tasks ] .reset-column[] .center[
] .credit[Figure credit: J. Johnson] --- # CPU vs GPU - SP = single precision, 32 bits / 4 bytes - DP = double precision, 64 bits / 8 bytes .center[
] --- # CPU vs GPU .center[
] .citation[Benchmarking State-of-the-Art Deep Learning Software Tools, Shi et al., 2016] --- # CPU vs GPU - more benchmarks available at [https://github.com/jcjohnson/cnn-benchmarks](https://github.com/jcjohnson/cnn-benchmarks) .center[
] .credit[Figure credit: J. Johnson] --- count: false # CPU vs GPU - more benchmarks available at [https://github.com/jcjohnson/cnn-benchmarks](https://github.com/jcjohnson/cnn-benchmarks) .center[
] .credit[Figure credit: J. Johnson] --- # System .center[
] .credit[Figure credit: F. Fleuret] --- count: false # System .center[
] .credit[Figure credit: F. Fleuret] --- count: false # System .center[
] .credit[Figure credit: F. Fleuret] --- count: false # System .center[
] .credit[Figure credit: F. Fleuret] --- count: false # System .center[
] .credit[Figure credit: F. Fleuret] --- count: false # System .center[
] .credit[Figure credit: F. Fleuret] --- count: false # System .center[
] .credit[Figure credit: F. Fleuret] --- # GPU - NVIDIA GPUs are programmed through CUDA (.purple[Compute Unified Device Architecture]) - The alternative is OpenCL, supported by several manufacturers but with significant less investments than Nvidia - Nvidia and CUDA are dominating the field by far, though some alternatives start emerging: Google TPUs, embedded devices. --- # Libraries - BLAS (_Basic Linear Algebra Subprograms_): vector/matrix products, and the cuBLAS implementation for NVIDIA GPUs - LAPACK (_Linear Algebra Package_): linear system solving, Eigen-decomposition, etc. - cuDNN (_NVIDIA CUDA Deep Neural Network library_) computations specific to deep-learning on NVIDIA GPUs. --- # GPU usage in pytorch - Tensors of torch.cuda types are in the GPU memory. Operations on them are done by the GPU and resulting tensors are stored in its memory. - Operations cannot mix different tensor types (CPU vs. GPU, or different numerical types); except `copy_()` - Moving data between the CPU and the GPU memories is far slower than moving it inside the GPU memory. --- # GPU usage in pytorch - The `Tensor` method `cuda()` / `.to('cuda')` returns a clone on the GPU if the tensor is not already there or returns the tensor itself if it was already there, keeping the bit precision. - The method `cpu()` / `.to('cpu')` makes a clone on the CPU if needed. - They both keep the original tensor unchanged --- class: center, middle # Understanding and visualizing CNNs .center[
] --- # What happens inside a CNN? .left-column[
.center[Visualize first layers filters/weights] ] .right-column[.center[
]] .reset-column[] .center[
] .citation[M. Zeiler & R. Fergus, Visualizing and Understanding Convolutional Networks, ECCV 2014] --- count: false # What happens inside a CNN? .left-column[
.center[Visualize first layers filters/weights] ] .right-column[.center[
]] .reset-column[ ] .left[
] .citation[M. Zeiler & R. Fergus, Visualizing and Understanding Convolutional Networks, ECCV 2014] --- count: false # What happens inside a CNN? .left-column[ - Visualize behavior in higher layers - We can visualize filters at higher layers, but they are less intuitive ] .right-column[.center[
]] .reset-column[ ]
.center[
] --- count: false # What happens inside a CNN? .left-column[
.center[Visualize first layers filters/weights] ] .right-column[.center[
]] .reset-column[ ] .left[
] .citation[M. Zeiler & R. Fergus, Visualizing and Understanding Convolutional Networks, ECCV 2014] --- count: false # What happens inside a CNN? .left-column[
.center[Visualize first layers filters/weights] ] .right-column[.center[
]] .reset-column[ ] .left[
] .citation[M. Zeiler & R. Fergus, Visualizing and Understanding Convolutional Networks, ECCV 2014] --- count: false # What happens inside a CNN? .left-column[
.center[Visualize first layers filters/weights] ] .right-column[.center[
]] .reset-column[ ] .left[
] .citation[M. Zeiler & R. Fergus, Visualizing and Understanding Convolutional Networks, ECCV 2014] --- count: false # What happens inside a CNN? .left-column[
.center[Visualize first layers filters/weights] ] .right-column[.center[
]] .reset-column[ ] .left[
] .citation[M. Zeiler & R. Fergus, Visualizing and Understanding Convolutional Networks, ECCV 2014] --- count: false # What happens inside a CNN? .left-column[ - 4096d "signature" for an image (layer right before the classifier) - Visualize with t-SNE: [here](http://cs.stanford.edu/people/karpathy/cnnembed/) ] .right-column[.center[
]] .reset-column[ ] .center[
] --- # Feature evolution during training - For a particular neuron (that generates a feature map) - Pick the strongest activation during training - For epochs 1, 2, 5, 10, 20, 30, 40, 64
.center[
] .citation[M. Zeiler & R. Fergus, Visualizing and Understanding Convolutional Networks, ECCV 2014] --- # Some words of caution These specific neurons firing on specific patterns or classes, .italic[i.e.] _cat-neurons_ might give us the idea of understanding the behavior of neural networks. -- count: false However, recent results show that removing these neurons, the performance of the networks does not decrease noticably. .center.width-60[![](images/part6/selectivity_figure.jpg)] .citation[A. Morcos et al., On the importance of single directions for generalization, ICLR 2018] --- # Visualize layer activations/feature maps ## AlexNet .center[
] .center[
] .credit[Figure credit: F. Fleuret] --- count: false # Visualize layer activations/feature maps ## AlexNet .center[
] .center[
] .credit[Figure credit: F. Fleuret] --- count: false # Visualize layer activations/feature maps ## AlexNet .center[
] .center[
] .credit[Figure credit: F. Fleuret] --- count: false # Visualize layer activations/feature maps ## AlexNet .center[
] .center[
] .credit[Figure credit: F. Fleuret] --- count: false # Visualize layer activations/feature maps ## AlexNet .center[
] .center[
] .credit[Figure credit: F. Fleuret] --- # Occlusion sensitivity An approach to understand the behavior of a network is to look at the output of the network "around" an image. We can get a simple estimate of the importance of a part of the input image by computing the difference between: 1. the value of the maximally responding output unit on the image, and 2. the value of the same unit with that part occluded. --- count: false # Occlusion sensitivity An approach to understand the behavior of a network is to look at the output of the network "around" an image. We can get a simple estimate of the importance of a part of the input image by computing the difference between: 1. the value of the maximally responding output unit on the image, and 2. the value of the same unit with that part occluded. .red[This is computationally intensive since it requires as many forward passes as there are locations of the occlusion mask, ideally the number of pixels.] --- # Occlusion sensitivity .center.width-70[![](images/part6/occlusion_zeiler.png)] .citation[M. Zeiler & R. Fergus, Visualizing and Understanding Convolutional Networks, ECCV 2014] --- # Visualize arbitrary neurons DeepVis toolbox [https://www.youtube.com/watch?v=AgkfIQ4IGaM ](https://www.youtube.com/watch?v=AgkfIQ4IGaM ) .center[
] --- # Maximum response samples What does a convolutional network see? Convolutional networks can be inspected by looking for input images $\mathbf{x}$ that maximize the activation $\mathbf{h}\_{\ell,d}(\mathbf{x})$ of a chosen convolutional kernel $\mathbf{u}$ at layer $\ell$ and index $d$ in the layer filter bank. Such images can be found by gradient ascent on the input space: $$\begin{aligned} \mathcal{L}\_{\ell,d}(\mathbf{x}) &= ||\mathbf{h}\_{\ell,d}(\mathbf{x})||\_2\\\\ \mathbf{x}\_0 &\sim U[0,1]^{C \times H \times W } \\\\ \mathbf{x}\_{t+1} &= \mathbf{x}\_t + \gamma \nabla\_{\mathbf{x}} \mathcal{L}\_{\ell,d}(\mathbf{x}\_t) \end{aligned}$$ --- # Maximum response samples .center[
] .credit[Figure credit: F. Fleuret] --- count: false # Maximum response samples .center[
] .credit[Figure credit: F. Fleuret] --- count: false # Maximum response samples .center[
] .credit[Figure credit: F. Fleuret] --- count: false # Maximum response samples .center[
] .credit[Figure credit: F. Fleuret] --- count: false # Maximum response samples .center[
] .credit[Figure credit: F. Fleuret] --- count: false # Maximum response samples .center[
] .credit[Figure credit: F. Fleuret] --- count: false # Maximum response samples .center[
] .credit[Figure credit: F. Fleuret] --- count: false # Maximum response samples .center[
] .credit[Figure credit: F. Fleuret] --- # Many more visualization techniques .center[
] --- # Grad-CAM .center.width-90[![](images/part6/gradcam_1.png)] .citation[R.R. Selvaraju et al., Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization, ICCV 2017] --- # Grad-CAM .center.width-90[![](images/part6/gradcam_2.png)] .citation[R.R. Selvaraju et al., Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization, ICCV 2017] --- # Other resources DrawNet [http://people.csail.mit.edu/torralba/research/drawCNN/drawNet.html](http://people.csail.mit.edu/torralba/research/drawCNN/drawNet.html) .center[
] --- # Other resources Basic CNNs [http://scs.ryerson.ca/~aharley/vis/](http://scs.ryerson.ca/~aharley/vis/) .center[
] --- # Other resources Keras-JS [https://transcranial.github.io/keras-js/](https://transcranial.github.io/keras-js/) .center[
] --- # Other resources TensorFlow playground [http://playground.tensorflow.org](http://playground.tensorflow.org) .center[
] --- class: center, middle # Adversarial attacks --- class: middle .center.width-70[
] --- class: middle $$\begin{aligned} &\min ||\mathbf{r}||\_2 \\\\ \text{s.t. } &f(\mathbf{x}+\mathbf{r})=y'\\\\ &\mathbf{x}+\mathbf{r} \in [0,1]^p \end{aligned}$$ where - $y'$ is some target label, different from the original label $y$ associated to $\mathbf{x}$, - $f$ is a trained neural network. --- class: middle .center.width-70[
] .center[(Left) Original images $\mathbf{x}$. (Middle) Noise $\mathbf{r}$. (Right) Modified images $\mathbf{x}+\mathbf{r}$.
All are classified as 'Ostrich'. (Szegedy et al, 2013)] ---
Even simpler, take a step along the direction of the sign of the gradient at each pixel: $$\mathbf{r} = \epsilon\, \text{sign}(\nabla\_\mathbf{x} \ell(y', f(\mathbf{x}))) $$ where $\epsilon$ is the magnitude of the perturbation. --
.center.width-70[
] .center[The panda on the right is classified as a 'Gibbon'. (Goodfellow et al, 2014)] --- # Not just for neural networks Many other machine learning models are subject to adversarial examples, including: - Linear models - Logistic regression - Softmax regression - Support vector machines - Decision trees - Nearest neighbors --- # Fooling neural networks
.center.width-80[![](images/part6/fooling-exp.png)] .citation[Nguyen et al, 2014] --- class: middle .center.width-40[
] .citation[Nguyen et al, 2014] --- # One pixel attacks .center.width-30[
] .citation[Su et al, 2017] --- # Universal adversarial perturbations .center.width-30[
] .citation[Moosavi-Dezfooli et al, 2016] --- # Fooling deep structured prediction models .center.width-100[![](images/part6/houdini1.png)] .citation[Cisse et al, 2017] --- class: middle .center.width-70[
] .citation[Cisse et al, 2017] --- # Attacks in the real world .center[
] --- # Attacks in the real world .center[
] --- # Security threat Adversarial attacks pose a **security threat** to machine learning systems deployed in the real world. Examples include: - fooling real classifiers trained by remotely hosted API (e.g., Google), - fooling malware detector networks, - obfuscating speech data, - displaying adversarial examples in the physical world and fool systems that perceive them through a camera. --- class: middle .center.width-90[![](images/part6/attack-av.png)] .credit[Credits: [Adversarial Examples and Adversarial Training](https://berkeley-deep-learning.github.io/cs294-dl-f16/slides/2016_10_5_CS294-131.pdf) (Goodfellow, 2016)] --- class: middle # Adversarial defenses --- # Defenses
.center.width-80[![](images/part6/defenses.png)] .credit[Credits: [Adversarial Examples and Adversarial Training](https://berkeley-deep-learning.github.io/cs294-dl-f16/slides/2016_10_5_CS294-131.pdf) (Goodfellow, 2016)] --- # Failed defenses
"In this paper we evaluate ten proposed defenses and demonstrate that none of them are able to withstand a white-box attack. We do this by constructing defense-specific loss functions that we minimize with a strong iterative attack algorithm. With these attacks, on CIFAR an adversary can create imperceptible adversarial examples for each defense. By studying these ten defenses, we have drawn two lessons: existing defenses lack thorough security evaluations, and adversarial examples are much more difficult to detect than previously recognized." .pull-right[(Carlini and Wagner, 2017)]
.center[Adversarial attacks and defenses remain an **open research problem**.] --- # Recap ## CPU vs GPU ## Visualization of CNN activations and most salient patterns ## Adversarial attacks --- class: end-slide, center count: false The end.