Under the hood [Andrei Bursuc]

### Deep Learning - MAP583 2019-2020
## Part 6: Under the hood

url: https://abursuc.github.io/slides/polytechnique/06_under_hood.html

.citation[
With slides from A. Karpathy, F. Fleuret, J. Johnson, S. Yeung, G. Louppe, Y. Avrithis ...]

---

# GPUs

---

# CPU vs GPU

.right[
.center[GPU]
 
.center[<img src="images/part6/nvidia_tesla_v100.jpg" style="width: 500px;" />]
]

---

# CPU vs GPU

.left-column[
- CPU: 
  + fewer cores; each core is faster and more powerful
  + useful for sequential tasks
  ]

.right-column[
- GPU:
  + more cores; each core is slower and weaker
  + great for parallel tasks  
]

---

# CPU vs GPU

.left-column[
- CPU: 
  + fewer cores; each core is faster and more powerful
  + useful for sequential tasks
  ]

.right-column[
- GPU:
  + more cores; each core is slower and weaker
  + great for parallel tasks  
]

---

# CPU vs GPU

- SP = single precision, 32 bits / 4 bytes
- DP = double precision, 64 bits / 8 bytes

---
# CPU vs GPU

# CPU vs GPU

- more benchmarks available at [https://github.com/jcjohnson/cnn-benchmarks](https://github.com/jcjohnson/cnn-benchmarks)

---
count: false

# CPU vs GPU

- more benchmarks available at [https://github.com/jcjohnson/cnn-benchmarks](https://github.com/jcjohnson/cnn-benchmarks)

---

# System

# System

# System

# System

# System

# System

---
count: false

# System

---

# GPU

- NVIDIA GPUs are programmed through CUDA (.purple[Compute Unified Device Architecture])

- The alternative is OpenCL, supported by several manufacturers but with significant less investments than Nvidia

- Nvidia and CUDA are dominating the field by far, though some alternatives start emerging: Google TPUs, embedded devices.

---
# Libraries

- BLAS (_Basic Linear Algebra Subprograms_): vector/matrix products, and the cuBLAS implementation for NVIDIA GPUs

- LAPACK (_Linear Algebra Package_): linear system solving, Eigen-decomposition, etc.

- cuDNN (_NVIDIA CUDA Deep Neural Network library_) computations specific to deep-learning on NVIDIA GPUs.

---
# GPU usage in pytorch

- Tensors of torch.cuda types are in the GPU memory. Operations on them are done by the GPU and resulting tensors are stored in its memory.

- Operations cannot mix different tensor types (CPU vs. GPU, or different numerical types); except `copy_()`

- Moving data between the CPU and the GPU memories is far slower than moving it inside the GPU memory.

---
# GPU usage in pytorch

- The `Tensor` method `cuda()` / `.to('cuda')`  returns a clone on the GPU if the tensor is not already there or returns the tensor itself if it was already there, keeping the bit precision.

- The method `cpu()` / `.to('cpu')` makes a clone on the CPU if needed.

- They both keep the original tensor unchanged

---

# Understanding and visualizing CNNs

---

# What happens inside a CNN?

---
count: false
# What happens inside a CNN?

---
count: false

# What happens inside a CNN?

.left-column[
- Visualize behavior in higher layers
- We can visualize filters at higher layers, but they are less intuitive
]

---
count: false

# What happens inside a CNN?

---
count: false

# What happens inside a CNN?

---
count: false

# What happens inside a CNN?

---
count: false

# What happens inside a CNN?

---
count: false

# What happens inside a CNN?

.left-column[
- 4096d "signature" for an image (layer right before the classifier)
- Visualize with t-SNE: [here](http://cs.stanford.edu/people/karpathy/cnnembed/)
]

---

# Feature evolution during training

- For a particular neuron (that generates a feature map) 
- Pick the strongest activation during training
- For epochs 1, 2, 5, 10, 20, 30, 40, 64

---

# Some words of caution

These specific neurons firing on specific patterns or classes, .italic[i.e.] _cat-neurons_ might give us the idea of understanding the behavior of neural networks.

--
count: false

However, recent results show that removing these neurons, the performance of the networks does not decrease noticably.

.center.width-60[![](images/part6/selectivity_figure.jpg)]

---