layout: true .center.footer[Marc LELARGE and Andrei BURSUC | Deep Learning Do It Yourself | 14.5 ResNets] --- class: center, middle, title-slide count: false ## Going Deeper # 14.5 ResNets
.bold[Andrei Bursuc ]
url: https://dataflowr.github.io/website/ .citation[ With slides from A. Karpathy, F. Fleuret, G. Louppe, C. Ollion, O. Grisel, Y. Avrithis ...] --- class: middle, center ## How deep can we go now? --- # A saturation point If we continue stacking more layers on a CNN: .center.width-90[data:image/s3,"s3://crabby-images/1d4c7/1d4c716ecb8672a9e2183684efc96d8f67a5afe1" alt=""] -- count: false .center[.red[Deeper models are harder to optimize]] --- .left-column[ # ResNet ] .right-column[ .center.width-50[data:image/s3,"s3://crabby-images/9bcb8/9bcb82f663b124c50838ef762deec07c76145724" alt=""] ] A block learns the residual w.r.t. identity .center.width-40[data:image/s3,"s3://crabby-images/ccf5e/ccf5e3446f724e6edec7233631fe02a7ff1e838a" alt=""] .citation[K. He et al., Deep residual learning for image recognition, CVPR 2016.] -- count: false - Good optimization properties --- .left-column[ # ResNet ] .right-column[ .center.width-50[data:image/s3,"s3://crabby-images/9bcb8/9bcb82f663b124c50838ef762deec07c76145724" alt=""] ] Even deeper models: 34, 50, 101, 152 layers .citation[K. He et al., Deep residual learning for image recognition, CVPR 2016.] --- .left-column[ # ResNet ] .right-column[ .center.width-50[data:image/s3,"s3://crabby-images/9bcb8/9bcb82f663b124c50838ef762deec07c76145724" alt=""] ] ResNet50 Compared to VGG: - Superior accuracy in all vision tasks
**5.25%** top-5 error vs 7.1% .citation[K. He et al., Deep residual learning for image recognition, CVPR 2016.] -- count: false - Less parameters
**25M** vs 138M -- count: false - Computational complexity
**3.8B Flops** vs 15.3B Flops -- count: false - Fully Convolutional until the last layer --- # ResNet ## Performance on ImageNet .center.width-90[data:image/s3,"s3://crabby-images/29b80/29b803ee444ede9321b5845690bbcfb2e8e6cc72" alt=""] --- # ResNet ## Results .center.width-100[data:image/s3,"s3://crabby-images/4fb86/4fb86e2d2ea336358b486f2e7cb62764356f6ca8" alt=""] --- # ResNet ## Results .center.width-60[data:image/s3,"s3://crabby-images/dbc55/dbc553fb31214e9c9ce1bf72566e78879d05f38b" alt=""] --- # ResNet In PyTorch: ```py def make_resnet_block(num_feature_maps , kernel_size = 3): return nn.Sequential( nn.Conv2d(num_feature_maps , num_feature_maps , kernel_size = kernel_size , padding = (kernel_size - 1) // 2), nn.BatchNorm2d(num_feature_maps), nn.ReLU(inplace = True), nn.Conv2d(num_feature_maps , num_feature_maps , kernel_size = kernel_size , padding = (kernel_size - 1) // 2), nn.BatchNorm2d(num_feature_maps), ) ``` --- # ResNet In PyTorch: ```py def __init__(self, num_residual_blocks, num_feature_maps) ... self.resnet_blocks = nn.ModuleList() for k in range(nb_residual_blocks): self.resnet_blocks.append(make_resnet_block(num_feature_maps , 3)) ... ``` ```py def forward(self,x): ... for b in self.resnet_blocks: * x = x + b(x) ... return x ``` --- For ResNet50+ layers some additional modifications need to be made to keep number of parameters and computations manageable .center.width-70[data:image/s3,"s3://crabby-images/bf0ce/bf0cedfff1c3fee92ed8723922849b8831bd5a94" alt=""] Such a block requires $2 \times (3 \times 3 \times 256 +1) \times 256 \simeq 1.2M$ parameters .credit[Credits: F. Fleuret, [EE-559 Deep Learning](https://fleuret.org/dlc/), EPFL] -- count: false Adress this problem using __bottleneck__ block .center.width-100[data:image/s3,"s3://crabby-images/cd449/cd4497a89e4d9c821f46e673267f801f84d61728" alt=""] .center[$256 \times 64 + (3 \times 3 \times 64 +1) \times 64 + 64 \times 256 \simeq 70K$ parameters] --- # Stochastic Depth Networks - DropOut at layer level - Allows training up to 1K layers .center.width-70[data:image/s3,"s3://crabby-images/d68eb/d68eb7ae6cc0dd7f635364587158a50d4d5c79ba" alt=""] .citation[Huang et al., Deep Networks with Stochastic Depth, ECCV 2016] --- # DenseNet - Copying feature maps to upper layers via skip-connections - Better reuse of parameters and redundancy avoidance .center.width-30[data:image/s3,"s3://crabby-images/a3413/a3413891d3ba2c97ecb8fe41673e0f7fa5b3e6d5" alt=""] .center.width-70[data:image/s3,"s3://crabby-images/d86bd/d86bd71e39b532ae636aa516c4c5101696fd636c" alt=""] .citation[Huang et al., Densely Connected Convolutional Networks, CVPR 2017] --- # Squeeze-and-Excitation Networks
.center.width-70[data:image/s3,"s3://crabby-images/5ef86/5ef860fefaba4e0d03da49dbb217221941c99002" alt=""] .caption[SE block] - re-weight individual feature maps .citation[J. Hu et al., Squeeze-and-Excitation Networks, CVPR 2018] --- # Squeeze-and-Excitation Networks .center.width-50[data:image/s3,"s3://crabby-images/f483c/f483c91a68601b647521b6c1c9110ef19e40ff79" alt=""] .caption[SE block] .citation[J. Hu et al., Squeeze-and-Excitation Networks, CVPR 2018] --- # Visualizing loss surfaces .center.width-70[data:image/s3,"s3://crabby-images/f1f55/f1f5527114ff3d61b9986baa1a190388b17f52ed" alt=""] .citation[H. Li et al., Visualizing the Loss Landscape of Neural Nets, ICLR workshop 2018] --- # Visualizing loss surfaces .left-column[ .center.width-100[data:image/s3,"s3://crabby-images/9adc0/9adc05839c1ee072f2915dfda1b2059fcd03adbf" alt=""] ] .right-column[
- ResNet-20/56/110 : vanilla - ResNet-*-noshort: no skip connections - ResNet-18/34/50 : wide ] .citation[H. Li et al., Visualizing the Loss Landscape of Neural Nets, ICLR workshop 2018] --- # Visualizing loss surfaces .center.width-50[data:image/s3,"s3://crabby-images/30a90/30a9026e80ba19c51d2b25fc5546a91cef9ab112" alt=""] .citation[H. Li et al., Visualizing the Loss Landscape of Neural Nets, ICLR workshop 2018] --- class: end-slide, center count: false The end.