Implementation of ResNet

In this part, we will build a ResNet model from scratch using PyTorch. We will implement the ResNet-50 architecture.

Packages

First, let's import the necessary libraries.

import torch
import torch.nn as nn
from torchvision import datasets, transforms, utils
import torchvision.transforms as transforms
from torchvision.transforms import functional as F
import matplotlib.pyplot as plt
import numpy as np

Bottleneck Block

In order to build the ResNet model more easily, we will first implement the Bottleneck block. The Bottleneck block here contains 4 parameters: in_channels, out_channels, stride, and downsample.

in_channels is the number of input channels into this bottleneck block, out_channels is the number of output channels, stride is the stride of the second convolutional layer.

downsample is used when the input if the number of input channels is not equal to the number of output channels. We can tell from our code that it is used before the final step of the forward propagation in the bottleneck architecture. If the input channels is equal to the output channels, then it will be an identity mapping, the input will be added to the output. However, when the input channels is not equal to the output channels, we need to use the downsample to match the dimensions, and add the transformed input to the output.

Note that the expansion parameter is set to 4, which means that the output channels will be 4 times the input channels. We found that with this parameter, the model can be easily transformed into different ResNet architectures. For instance, if we set the expansion to 2, then the output channels will be 2 times the input channels, and the model will be a ResNet-34 model. In our case, we set the expansion to 4, so the model will be a ResNet-50 model.

And a batch normalization layer is added after each convolutional layer, and a ReLU activation function is added after each batch. By doing this, we can prevent the vanishing gradient problem and speed up the training process.

class Bottleneck(nn.Module):
    expansion = 4
 
    def __init__(self, in_channels, out_channels, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.conv3 = nn.Conv2d(out_channels, out_channels * self.expansion, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(out_channels * self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
 
    def forward(self, x):
        identity = x
 
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
 
        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)
 
        out = self.conv3(out)
        out = self.bn3(out)
 
        if self.downsample is not None:
            identity = self.downsample(x)
 
        out += identity
        out = self.relu(out)
 
        return out

ResNet Model

With the Bottleneck block implemented, we can now build the ResNet model. The ResNet model consists of 4 layers, each layer contains several Bottleneck blocks. The number of blocks in each layer is specified by the layers parameter. The num_classes parameter is the number of classes in the dataset, in our case, it is 10 since we will use CIFAR-10 dataset.

Basically, the ResNet model is built by stacking multiple Bottleneck blocks, and can be divided into 4 layers.

The _make_layer function is used to create a layer with multiple Bottleneck blocks. The if statement is used to check if the input channels is not equal to the output channels, if so, we need to use the downsample to match the dimensions, an extra convolutional layer is added to the downsample to match the dimensions.

In the end, we give the 4 layers stacks of bottleneck blocks of [3, 4, 6, 3] to the ResNet model, which is the configuration of ResNet-50.

class ResNet(nn.Module):
    def __init__(self, block, layers, num_classes):
        super(ResNet, self).__init__()
        self.in_channels = 64
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512 * block.expansion, num_classes)
 
    def _make_layer(self, block, out_channels, blocks, stride=1):
        downsample = None
        if stride != 1 or self.in_channels != out_channels * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channels, out_channels * block.expansion, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels * block.expansion),
            )
 
        layers = []
        layers.append(block(self.in_channels, out_channels, stride, downsample))
        self.in_channels = out_channels * block.expansion
        for _ in range(1, blocks):
            layers.append(block(self.in_channels, out_channels))
 
        return nn.Sequential(*layers)
 
    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)
 
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
 
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)
        return x
 
def resnet50():
    return ResNet(Bottleneck, [3, 4, 6, 3], 10)
 
model = resnet50()

Bottleneck Architectures Model Fitting and Evaluation