mobilenet V1V2讲解以及torch代码_综合

详细理解见：MobileNet，从V1到V3，讲的非常通俗易懂，大赞。本文也是摘录自上述文章，做了少许改动，代码是自己写的。

Mobilenet V1

一句话精髓：MobileNetV1就是把VGG中的标准卷积层换成深度可分离卷积！！！

深度可分离卷积就是将普通卷积拆分成为一个深度卷积和一个逐点卷积。

我们在进行普通卷积的时候，为了获取有效信息，一般在卷积缩小特征图大小的同时，会增加特征图的通道数。

深度可分离卷积就是将普通卷积的过程分为两部分进行，第一部分，深度卷积：采用普通卷积缩小特征图，不过通道数不变，这样特征图大小缩小了，但是通道数没变，会损失信息，所以深度可分离卷积的第二部分，逐点卷积：采用1x1的卷积核将图片通道数增大，这样就有效的避免的信息损失，并且最后的结果和普通卷积的结果相同。

这种方法能用更少的参数，更少的运算，但是能达到差的不是很多的结果。

我们通常所使用的是3×3的卷积核，也就是会下降到原来的九分之一到八分之一。

标准卷积和深度可分离卷积的对比图

标准卷积和深度可分离卷积的参数量和计算量对比

标准卷积和深度可分离卷积的模块对比

深度可分离卷积与标准卷积，参数和计算量能下降为后者的九分之一到八分之一左右。但是准确率只有下降极小的1％。

V1网络结构

代码如下：

import torch.nn as nn
import torch.nn.functional as Fclass Block(nn.Module):'''Depthwise conv + Pointwise conv'''def __init__(self, in_planes, out_planes, stride=1):super(Block, self).__init__()# 深度卷积，通道数不变，用于缩小特征图大小self.conv1 = nn.Conv2d(in_planes, in_planes, kernel_size=3, stride=stride, padding=1, groups=in_planes, bias=False)self.bn1 = nn.BatchNorm2d(in_planes)# 逐点卷积，用于增大通道数self.conv2 = nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=1, padding=0, bias=False)self.bn2 = nn.BatchNorm2d(out_planes)def forward(self, x):out = F.relu(self.bn1(self.conv1(x)))out = F.relu(self.bn2(self.conv2(out)))return outclass MobileNet(nn.Module):cfg = [64, (128,2), 128, (256,2), 256, (512,2), 512, 512, 512, 512, 512, (1024,2), 1024]def __init__(self, num_classes=10):super(MobileNet, self).__init__()# 首先是一个标准卷积self.conv1 = nn.Conv2d(3, 32, kernel_size=3, stride=2, padding=1, bias=False)self.bn1 = nn.BatchNorm2d(32)# 然后堆叠深度可分离卷积self.layers = self._make_layers(in_planes=32)self.linear = nn.Linear(1024, num_classes)def _make_layers(self, in_planes):laters = []for x in self.cfg:out_planes = x if isinstance(x, int) else x[0]stride = 1 if isinstance(x, int) else x[1]laters.append(Block(in_planes, out_planes, stride))in_planes = out_planesreturn nn.Sequential(*laters)def forward(self, x):# 一个普通卷积out = F.relu(self.bn1(self.conv1(x)))# 叠加深度可分离卷积out = self.layers(out)# 平均池化层会将feature变成1x1out = F.avg_pool2d(out, 7)# 展平out = out.view(out.size(0), -1)# 全连接层out = self.linear(out)# softmax层output = F.softmax(out, dim=1)return output'''测试'''
# def test():
#     net = MobileNet()
#     x = torch.randn(1, 3, 224, 224)
#     y = net(x)
#     print(y.size())
#     print(y)
#     print(torch.max(y,dim=1))
#
# test()
# net = MobileNet()
# print(net)

MobileNet V2

回顾mobilenet v1:

V1核心思想是采用 深度可分离卷积 操作。在相同的权值参数数量的情况下，相较标准卷积操作，可以减少数倍的计算量，从而达到提升网络运算速度的目的。

首先利用3×3的深度可分离卷积提取特征，然后利用1×1的卷积来扩张通道。用这样的block堆叠起来的MobileNetV1既能较少不小的参数量、计算量，提高网络运算速度，又能的得到一个接近于标准卷积的还不错的结果，看起来是很美好的。

但是！

有人在实际使用的时候，发现深度卷积部分的卷积核比较容易训废掉：训完之后发现深度卷积训出来的卷积核有不少是空的：

作者认为这是ReLU这个浓眉大眼的激活函数的锅。

对低维度做ReLU运算，很容易造成信息的丢失。而在高维度进行ReLU运算的话，信息的丢失则会很少。

1、Linear bottleneck

既然是ReLU导致的信息损耗，将ReLU替换成线性激活函数。

我们当然不能把所有的激活层都换成线性的啊，所以我们就悄咪咪的把最后的那个ReLU6换成Linear。

2、Expansion layer

深度卷积本身没有改变通道的能力，来的是多少通道输出就是多少通道。如果来的通道很少的话，DW深度卷积只能在低维度上工作，这样效果并不会很好，所以我们要“扩张”通道。既然我们已经知道PW逐点卷积也就是1×1卷积可以用来升维和降维，那就可以在DW深度卷积之前使用PW卷积进行升维（升维倍数为t，t=6），再在一个更高维的空间中进行卷积操作来提取特征：

3、Inverted residuals

可以发现，都采用了 1×1 -> 3 ×3 -> 1 × 1 的模式，以及都使用Shortcut结构。但是不同点呢：

ResNet 先降维 (0.25倍)、卷积、再升维。
MobileNetV2 则是先升维 (6倍)、卷积、再降维。

刚好V2的block刚好与Resnet的block相反，作者将其命名为Inverted residuals。就是论文名中的Inverted residuals。

V2的block

v1和v2的对比

左边是v1的block，没有Shortcut并且带最后的ReLU6。

右边是v2的加入了1×1升维，引入Shortcut并且去掉了最后的ReLU，改为Linear。步长为1时，先进行1×1卷积升维，再进行深度卷积提取特征，再通过Linear的逐点卷积降维。将input与output相加，形成残差结构。步长为2时，因为input与output的尺寸不符，因此不添加shortcut结构，其余均一致。

v2网络结构

实验结果对比图

网络结构图：

对应代码如下：

import torch
import torch.nn as nnclass Bottleneck(nn.Module):def __init__(self, x):super().__init__()self.cfg = xself.conv1x1_1 = nn.Sequential(nn.Conv2d(self.cfg[0], self.cfg[1], kernel_size=1, padding=0, stride=1),nn.BatchNorm2d(self.cfg[1]),nn.ReLU6())self.conv3x3 = nn.Sequential(nn.Conv2d(self.cfg[2], self.cfg[3], kernel_size=3, padding=1, stride=self.cfg[6]),nn.BatchNorm2d(self.cfg[3]),nn.ReLU6())self.conv1x1_2 = nn.Sequential(nn.Conv2d(self.cfg[4], self.cfg[5], kernel_size=1, padding=0, stride=1),nn.BatchNorm2d(self.cfg[5]),nn.ReLU6())def forward(self, x):if self.cfg[7] == 1:residual = xoutput = self.conv1x1_1(x)output = self.conv3x3(output)output = self.conv1x1_2(output)if self.cfg[7] == 1:output += residualreturn outputclass MobileNetV2(nn.Module):cfg = [# in-out-in-out-in-out-stride-residual(32, 32, 32, 32, 32, 16, 1, 0),(16, 96, 96, 96, 96, 24, 2, 0),(24, 144, 144, 144, 144, 24, 1, 1), # add1(24, 144, 144, 144, 144, 32, 2, 0),(32, 192, 192, 192, 192, 32, 1, 1), # add2(32, 192, 192, 192, 192, 32, 1, 1), # add3(32, 192, 192, 192, 192, 64, 1, 0),(64, 384, 384, 384, 384, 64, 1, 1), # add4(64, 384, 384, 384, 384, 64, 1, 1), # add5(64, 384, 384, 384, 384, 64, 1, 1), # add6(64, 384, 384, 384, 384, 96, 2, 0),(96, 576, 576, 576, 576, 96, 1, 1), # add7(96, 576, 576, 576, 576, 96, 1, 1), # add8(96, 576, 576, 576, 576, 160, 2, 0),(160, 960, 960, 960, 960, 160, 1, 1),  # add9(160, 960, 960, 960, 960, 160, 1, 1),  # add10(160, 960, 960, 960, 960, 320, 1, 0),  # add11]def __init__(self, in_channel=3, NUM_CLASSES=10):super().__init__()# 首先一个普通卷积self.conv1 = nn.Sequential(nn.Conv2d(in_channel, 32, kernel_size=3, padding=1, stride=2),nn.BatchNorm2d(32),nn.ReLU6())# 深度卷积可分离+Inverted residualsself.layers = self._make_layers()# 将逐点卷积self.conv2 = nn.Sequential(nn.Conv2d(320, 1280, kernel_size=1, padding=0, stride=1),nn.BatchNorm2d(1280),nn.ReLU6())# 全局平均池化，将图像变成1x1大小self.pool = nn.AvgPool2d(kernel_size=7)# 最后为全连接self.linear = nn.Sequential(nn.Linear(1280, NUM_CLASSES))def _make_layers(self):layers = []for x in self.cfg:layers.append(Bottleneck(x))return nn.Sequential(*layers)def forward(self, x):output = self.conv1(x)output = self.layers(output)output = self.conv2(output)output = self.pool(output)output = output.view(output.size(0), -1)output = self.linear(output)return output'''测试'''
# def test():
#     net = MobileNetV2()
#     x = torch.randn(1, 3, 224, 224)
#     y = net(x)
#     print(y.size())
#
# test()
# net = MobileNetV2()
# print(net)