14.5 - ResNets [Andrei Bursuc]

.center.footer[Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 14.5 ResNets]

---

## Going Deeper 
# 14.5 ResNets

url: https://dataflowr.github.io/website/

.citation[
With slides from A. Karpathy, F. Fleuret,  G. Louppe, C. Ollion, O. Grisel, Y. Avrithis ...]

---

---
# A saturation point

If we continue stacking more layers on a CNN:

.center.width-90[![](images/part14/depth_problem.png)]

--
count: false

---
.left-column[
# ResNet
]

A block learns the residual w.r.t. identity

.center.width-40[![](images/part14/residualblock.png)]

- Good optimization properties

---
.left-column[
# ResNet
]

Even deeper models:

34, 50, 101, 152 layers

ResNet50 Compared to VGG:

- Superior accuracy in all vision tasks **5.25%** top-5 error vs 7.1%

.citation[K. He et al., Deep residual learning for image recognition, CVPR 2016.]
--
count: false
- Less parameters **25M** vs 138M

--
count: false
- Computational complexity **3.8B Flops** vs 15.3B Flops

--
count: false

- Fully Convolutional until the last layer

---
# ResNet

## Performance on ImageNet

.center.width-90[![](images/part14/resnet_1.png)]

---
# ResNet

## Results

.center.width-100[![](images/part14/resnet_3.png)]

---
# ResNet

## Results

.center.width-60[![](images/part14/resnet_8.png)]

---
# ResNet

In PyTorch:

```py
def make_resnet_block(num_feature_maps , kernel_size = 3): 
    
    return nn.Sequential(

nn.Conv2d(num_feature_maps , num_feature_maps , 
                  kernel_size = kernel_size ,
                  padding = (kernel_size - 1) // 2),

nn.BatchNorm2d(num_feature_maps),

nn.ReLU(inplace = True),
        nn.Conv2d(num_feature_maps , num_feature_maps , 
                   kernel_size = kernel_size ,
                    padding = (kernel_size - 1) // 2),

nn.BatchNorm2d(num_feature_maps),
)
```

---
# ResNet

In PyTorch:

```py
def __init__(self, num_residual_blocks, num_feature_maps)
...
    self.resnet_blocks = nn.ModuleList() 
    for k in range(nb_residual_blocks):
        self.resnet_blocks.append(make_resnet_block(num_feature_maps , 3))
...

```

```py
def forward(self,x):
...
    for b in self.resnet_blocks:
*        x = x + b(x)
...
    return x

```

---

For ResNet50+ layers some additional modifications need to be made to keep number of parameters and computations manageable

.center.width-70[![](images/part14/resnet_bottleneck_1.png)]

Such a block requires $2 \times (3 \times 3 \times 256 +1) \times 256 \simeq 1.2M$ parameters

--
count: false

Adress this problem using __bottleneck__ block

.center.width-100[![](images/part14/resnet_bottleneck_2.png)]

---
# Stochastic Depth Networks

- DropOut at layer level
- Allows training up to 1K layers

.center.width-70[![](images/part14/stochastic_depth_resnet.png)]

---
# DenseNet

- Copying feature maps to upper layers via skip-connections
- Better reuse of parameters and redundancy avoidance

.center.width-30[![](images/part14/densenet_1.png)]

.center.width-70[![](images/part14/densenet_2.png)]

---

# Squeeze-and-Excitation Networks

.center.width-70[![](images/part14/se-block.png)]
.caption[SE block]

- re-weight individual feature maps

# Squeeze-and-Excitation Networks

.center.width-50[![](images/part14/se-resnet.png)]
.caption[SE block]

---
# Visualizing loss surfaces

.center.width-70[![](images/part14/loss_surf_1.png)]

---
# Visualizing loss surfaces

.right-column[
 
- ResNet-20/56/110 : vanilla
- ResNet-*-noshort: no skip connections
- ResNet-18/34/50 : wide 
]
.citation[H. Li et al., Visualizing the Loss Landscape of Neural Nets, ICLR workshop 2018]
---
# Visualizing loss surfaces

.center.width-50[![](images/part14/loss_surf_3.png)]

---

The end.