【YOLOV5-5.x 源码解读】torch_utils.py_综合

前言
0、导入需要的包
1、torch_distributed_zero_first
2、init_torch_seeds
3、git_describe、date_modified、select_device
- 3.1、git_describe
- 3.2、date_modified
- 3.3、select_device
4、intersect_dicts
5、time_synchronized
6、profile、model_info
- 6.1、profile（没用到）
- 6.2、model_info
7、initialize_weights
8、find_modules（没用到）
9、sparsity、prune
- 9.1、sparsity
- 9.2、prune
10、fuse_conv_and_bn
11、load_classifier
12、scale_img
13、de_parallel
14、is_parallel、copy_attr、ModelEMA类
- 14.1、is_parallel
- 14.2、copy_attr
- 14.3、class ModelEMA
总结

前言

源码： YOLOv5源码.
导航: 【YOLOV5-5.x 源码讲解】整体项目文件导航.
注释版全部项目文件已上传至GitHub: yolov5-5.x-annotations.

这个文件主要是基于torch的一些实用工具类，整个项目的文件都可能会用到，并不涉及太多的矩阵操作，大多都是一些torch相关工具的使用。废话不多说，下面开始介绍每个函数。

0、导入需要的包

import datetime    # 时间模块 基于time进行了封装 更高级
import logging     # 日志功能生成模块
import math        # 数学函数模块
import os          # 与操作系统进行交互的模块
import platform    # 提供获取操作系统相关信息的模块
import subprocess  # 子进程定义及操作的模块
import time        # 时间模块 更底层
from contextlib import contextmanager  # 用于进行上下文管理的模块
from copy import deepcopy  # 实现深度复制的模块
from pathlib import Path   # Path将str转换为Path对象 使字符串路径易于操作的模块# 以下是一些基本的torch相关的类
import torch
import torch.backends.cudnn as cudnn
import torch.distributed as dist
import torch.nn as nn
import torch.nn.functional as F
import torchvisiontry:import thop  # 用于Pytorch模型的FLOPS计算工具模块
except ImportError:thop = None
logger = logging.getLogger(__name__)  # 初始化日志

1、torch_distributed_zero_first

这个函数是用来处理模型进行分布式训练时的同步问题，用在train.py中如：

在这里插入图片描述

torch_distributed_zero_first函数代码：

@contextmanager    # 这个是上下文管理器
def torch_distributed_zero_first(local_rank: int):"""train.py用于处理模型进行分布式训练时同步问题基于torch.distributed.barrier()函数的上下文管理器，为了完成数据的正常同步操作（yolov5中拥有大量的多线程并行操作）Decorator to make all processes in distributed training wait for each local_master to do something.:params local_rank: 代表当前进程号 0代表主进程 1、2、3代表子进程"""if local_rank not in [-1, 0]:# 如果执行create_dataloader()函数的进程不是主进程，即rank不等于0或者-1，# 上下文管理器会执行相应的torch.distributed.barrier()，设置一个阻塞栅栏，# 让此进程处于等待状态，等待所有进程到达栅栏处（包括主进程数据处理完毕）；dist.barrier()yield  # yield语句 中断后执行上下文代码，然后返回到此处继续往下执行if local_rank == 0:# 如果执行create_dataloader()函数的进程是主进程，其会直接去读取数据并处理，# 然后其处理结束之后会接着遇到torch.distributed.barrier()，# 此时，所有进程都到达了当前的栅栏处，这样所有进程就达到了同步，并同时得到释放。dist.barrier()

2、init_torch_seeds

这个函数用于初始化随机种子并确定训练模式，会被用在general.py中的init_seeds联合一起作几个随机数种子的初始化操作：
在这里插入图片描述

init_torch_seeds函数代码：

def init_torch_seeds(seed=0):"""用在general.py的init_seeds函数用于初始化随机种子并确定训练模式Speed-reproducibility tradeoff https://pytorch.org/docs/stable/notes/randomness.html"""# 为CPU设置随机种子，方便下次复现实验结果 to seed the RNG for all devices (both CPU and CUDA)torch.manual_seed(seed)# benchmark模式会自动寻找最优配置 但由于计算的随机性 每次网络进行前向传播时会有差异# 避免这种差异的方法就是将deterministic设置为True(表明每次卷积的高效算法相同)# 速度与可重复性之间的权衡 涉及底层卷积算法优化if seed == 0:# slower, more reproducible 慢 但是具有可重复性 适用于网络的输入数据在每次iteration都变化的话cudnn.benchmark, cudnn.deterministic = False, Trueelse:# faster, less reproducible 快 但是不可重复 适用于网络的输入数据维度或类型上变化不大cudnn.benchmark, cudnn.deterministic = True, False

3、git_describe、date_modified、select_device

下面三个函数是一起完成自动选择系统设备的操作，在select_device函数中会调用git_describe函数和date_modified函数。下面我依次介绍这三个函数。

3.1、git_describe

这个函数用于返回path文件可读的git描述，用于select_device函数中。

git_describe函数代码：

def git_describe(path=Path(__file__).parent):"""用在select_device用于返回path文件可读的git描述 return human-readable git description i.e. v5.0-5-g3e25f1ehttps://git-scm.com/docs/git-describepath: 需要在git中查询（文件描述）的文件名 默认当前文件的父路径"""# path must be a directorys = f'git -C {
      path} describe --tags --long --always'try:# 创建一个子进程在命令行执行 s(git) 命令(返回path文件的描述) 返回执行结果(path文件的描述)return subprocess.check_output(s, shell=True, stderr=subprocess.STDOUT).decode()[:-1]except subprocess.CalledProcessError as e:# 发生异常 path not a git repository 返回''return ''

3.2、date_modified

这个函数是返回人类可读的修改日期，用于select_device函数中。

date_modified函数代码：

def date_modified(path=__file__):"""返回path文件人类可读的修改日期return human-readable file modification date, i.e. '2021-3-26':params path: 文件名 默认当前文件"""t = datetime.datetime.fromtimestamp(Path(path).stat().st_mtime)return f'{
      t.year}-{
      t.month}-{
      t.day}'

3.3、select_device

这个函数才是主角，用于自动选择本机模型训练的设备，并输出日志信息。广泛用于train.py、test.py、detect.py等文件中，如：
在这里插入图片描述
select_device函数代码：

def select_device(device='', batch_size=None):"""广泛用于train.py、test.py、detect.py等文件中用于选择模型训练的设备 并输出日志信息:params device: 输入的设备 device = 'cpu' or '0' or '0,1,2,3':params batch_size: 一个批次的图片个数"""# git_describe(): 返回当前文件父文件的描述信息(yolov5) date_modified(): 返回当前文件的修改日期# s: 之后要加入logger日志的显示信息s = f'YOLOv5 ? {
      git_describe() or date_modified()} torch {
      torch.__version__} '  # string# 如果device输入为cpu cpu=True device.lower(): 将device字符串全部转为小写字母cpu = device.lower() == 'cpu'if cpu:# 如果cpu=True 就强制(force)使用cpu 令torch.cuda.is_available() = Falseos.environ['CUDA_VISIBLE_DEVICES'] = '-1'elif device:# 如果输入device不为空 device=GPU 直接设置 CUDA environment variable = device 加入CUDA可用设备os.environ['CUDA_VISIBLE_DEVICES'] = device# 检查cuda的可用性 如果不可用则终止程序assert torch.cuda.is_available(), f'CUDA unavailable, invalid device {
      device} requested'# 输入device为空 自行根据计算机情况选择相应设备 先看GPU 没有就CPU# 如果cuda可用 且 输入device != cpu 则 cuda=True 反正cuda=Falsecuda = not cpu and torch.cuda.is_available()if cuda:# devices: 如果cuda可用 返回所有可用的gpu设备 i.e. 0,1,6,7 如果不可用就返回 '0'devices = device.split(',') if device else '0'# n: 所有可用的gpu设备数量 device countn = len(devices)# 检查是否有gpu设备 且 batch_size是否可以能被显卡数目整除 check batch_size is divisible by device_countif n > 1 and batch_size:# 如果不能则关闭程序assert batch_size % n == 0, f'batch-size {
      batch_size} not multiple of GPU count {
      n}'space = ' ' * (len(s) + 1)   # 定义等长的空格# 满足所有条件 s加上所有显卡的信息for i, d in enumerate(devices):# p: 每个可用显卡的相关属性p = torch.cuda.get_device_properties(i)# 显示信息s加上每张显卡的属性信息s += f"{
      '' if i == 0 else space}CUDA:{
      d} ({
      p.name}, {
      p.total_memory / 1024 ** 2}MB)\n"  # bytes to MBelse:# cuda不可用显示信息s就加上CPUs += 'CPU\n'# 将显示信息s加入logger日志文件中logger.info(s.encode().decode('ascii', 'ignore') if platform.system() == 'Windows' else s)  # emoji-safe# 如果cuda可用就返回第一张显卡的的名称 如: GeForce RTX 2060 反之返回CPU对应的名称return torch.device('cuda:0' if cuda else 'cpu')

4、intersect_dicts

这个函数用于筛选字典中的键值对，将db中的键值对复制给da,但是除了exclude中的键值对。这个函数被用字train.py中载入预训练模型时，筛选预训练权重中的键值对，如下：
在这里插入图片描述
intersect_dicts函数代码：

def intersect_dicts(da, db, exclude=()):"""用于train.py中载入预训练模型时，筛选预训练权重中的键值对用于筛选字典中的键值对 将db中的键值对复制给da,但是除了exclude中的键值对"""# 返回字典da中的键值对 要求键k在字典db中且全部都不在exclude中 同时da中值的shape对应db中值的shape(相同)return {
    k: v for k, v in da.items() if k in dband not any(x in k for x in exclude) and v.shape == db[k].shape}

5、time_synchronized

这个函数用于在进行分布式操作时，精确的获取当前时间。这个函数被广泛的用于整个项目的各个文件中，只要涉及获取当前时间的操作，就需要调用这个函数，如推理的时候计算推理+NMS所花费时间 = t2 - t1：
在这里插入图片描述
time_synchronized函数代码：

def time_synchronized():"""这个函数被广泛的用于整个项目的各个文件中，只要涉及获取当前时间的操作，就需要调用这个函数精确计算当前时间 并返回当前时间https://blog.csdn.net/qq_23981335/article/details/105709273pytorch-accurate time先进行torch.cuda.synchronize()添加同步操作 再返回time.time()当前时间为什么不直接使用time.time()取时间，而要先执行同步操作，再取时间？说一下这样子做的原因:在pytorch里面，程序的执行都是异步的。如果time.time(), 测试的时间会很短，因为执行完end=time.time()程序就退出了而先加torch.cuda.synchronize()会先同步cuda的操作，等待gpu上的操作都完成了再继续运行end = time.time()这样子测试时间会准确一点"""if torch.cuda.is_available():torch.cuda.synchronize()return time.time()

6、profile、model_info

这两个函数主要用于输出模型的一些信息，如所有层数量, 模型总参数量等。

6.1、profile（没用到）

这个函数是用来输出某个网络结构（操作opts）的一些信息：总参数浮点计算量前向传播时间反向传播时间输入变量的shape 输出变量的shape。但是这个函数貌似没用到，在下面计算model_info时也是调用thop中的profile函数直接执行的，并没用用这里写的这个函数，所以这个函数如果不想看的话关系是不大的。

profile函数代码：

def profile(x, ops, n=100, device=None):"""输出某个网络结构(操作ops)的一些信息: 总参数 浮点计算量 前向传播时间 反向传播时间 输入变量的shape 输出变量的shape:params x: 输入tensor x:params ops: 操作ops(某个网络结构):params n: 执行多少轮ops:params device: 执行设备"""# profile a pytorch module or list of modules. Example usage:# x = torch.randn(16, 3, 640, 640) # input# m1 = lambda x: x * torch.sigmoid(x)# m2 = nn.SiLU()# profile(x, [m1, m2], n=100) # profile speed over 100 iterations# 选择设备device = device or torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')# 将x变量送入选择的设备上x = x.to(device)# 表明需要计算tensor x的梯度x.requires_grad = True# 打印当前设备的信息 浮点计算量GFLOPs 前向传播时间 反向传播时间 输入相关时间 输出相关时间print(torch.__version__, device.type, torch.cuda.get_device_properties(0) if device.type == 'cuda' else '')print(f"\n{
      'Params':>12s}{
      'GFLOPs':>12s}{
      'forward (ms)':>16s}{
      'backward (ms)':>16s}{
      'input':>24s}{
      'output':>24s}")for m in ops if isinstance(ops, list) else [ops]:# 确保ops中所有的操作都是在device设备中运行# hasattr(m, 'to'): 判断对象m没有to属性m = m.to(device) if hasattr(m, 'to') else m  # device# 确保操作m和tensor x是处于相同的精度 默认x是Float32的 half可以将精度减半m = m.half() if hasattr(m, 'half') and isinstance(x, torch.Tensor) and x.dtype is torch.float16 else m  # type# 初始化前向传播时间dtf 反向传播时间dtb 以及t变量用于记录三个时刻的时间(后面有写)dtf, dtb, t = 0., 0., [0., 0., 0.]  # dt forward, backwardtry:# 计算在输入为tensor x, 操作为m条件下的浮点计算量GFLOPsflops = thop.profile(m, inputs=(x,), verbose=False)[0] / 1E9 * 2except:flops = 0for _ in range(n):  # 执行100次 算平均 更准确t[0] = time_synchronized()     # 操作m前向传播前一时刻的时间y = m(x)                       # 操作m前向传播t[1] = time_synchronized()     # 操作m前向传播后一时刻的时间 = 操作m反向传播前一时刻的时间try:_ = y.sum().backward()     # 操作m反向传播t[2] = time_synchronized() # 操作m反向传播后一时刻的时间except:  # 如果没有反向传播t[2] = float('nan')dtf += (t[1] - t[0]) * 1000 / n  # 操作m平均每次前向传播所用时间dtb += (t[2] - t[1]) * 1000 / n  # 操作m平均每次反向传播所用时间# s_in: 输入变量的shapes_in = tuple(x.shape) if isinstance(x, torch.Tensor) else 'list'# s_out: 输出变量的shapes_out = tuple(y.shape) if isinstance(y, torch.Tensor) else 'list'# p: m操作(某个网络结构)的总参数 parametersp = sum(list(x.numel() for x in m.parameters())) if isinstance(m, nn.Module) else 0# 输出每个操作(某个网络结构)的信息: 总参数 浮点计算量 前向传播时间 反向传播时间 输入变量的shape 输出变量的shapeprint(f'{
      p:12}{
      flops:12.4g}{
      dtf:16.4g}{
      dtb:16.4g}{
      str(s_in):>24s}{
      str(s_out):>24s}')

6.2、model_info

这个函数是用来输出模型的所有信息的，这些信息包括：所有层数量, 模型总参数量, 需要求梯度的总参数量, img_size大小的model的浮点计算量GFLOPs。这个函数会被yolo.py文件的Model类的info函数调用，如下：
在这里插入图片描述
这样需要显示Model相关的信息时，只需要调用self.info即可。而这个函数调用profile的函数并不是上面自己写的，而是直接调的thop包。

model_info函数代码：

def model_info(model, verbose=False, img_size=640):"""用于yolo.py文件的Model类的info函数输出模型的所有信息 包括: 所有层数量, 模型总参数量, 需要求梯度的总参数量, img_size大小的model的浮点计算量GFLOPs:params model: 模型:params verbose: 是否输出每一层的参数parameters的相关信息:params img_size: int or list i.e. img_size=640 or img_size=[640, 320]"""# n_p: 模型model的总参数 number parametersn_p = sum(x.numel() for x in model.parameters())# n_g: 模型model的参数中需要求梯度(requires_grad=True)的参数量 number gradientsn_g = sum(x.numel() for x in model.parameters() if x.requires_grad)if verbose:# 表头: 'layer', 'name', 'gradient', 'parameters', 'shape', 'mu', 'sigma'# 第几层 层名 bool是否需要求梯度 当前层参数量 当前层参数shape 当前层参数均值 当前层参数方差print('%5s %40s %9s %12s %20s %10s %10s' % ('layer', 'name', 'gradient', 'parameters', 'shape', 'mu', 'sigma'))# 按表头输出每一层的参数parameters的相关信息for i, (name, p) in enumerate(model.named_parameters()):name = name.replace('module_list.', '')print('%5g %40s %9s %12g %20s %10.3g %10.3g' %(i, name, p.requires_grad, p.numel(), list(p.shape), p.mean(), p.std()))try:  # FLOPsfrom thop import profile  # 导入计算浮点计算量FLOPs的工具包# stride 模型的最大下采样率 有[8, 16, 32] 所以stride=32stride = max(int(model.stride.max()), 32) if hasattr(model, 'stride') else 32# 模拟一样输入图片 shape=(1, 3, 32, 32) 全是0img = torch.zeros((1, model.yaml.get('ch', 3), stride, stride), device=next(model.parameters()).device)  # input# 调用profile计算输入图片img=(1, 3, 32, 32)时当前模型的浮点计算量GFLOPs stride GFLOPs# profile求出来的浮点计算量是FLOPs /1E9 => GFLOPs *2是因为profile函数默认求的就是模型为float64时的浮点计算量# 而我们传入的模型一般都是float32 所以乘以2(可以点进profile看他定义的add_hooks函数)flops = profile(deepcopy(model), inputs=(img,), verbose=False)[0] / 1E9 * 2# expand img_size -> [img_size, img_size]=[640, 640]img_size = img_size if isinstance(img_size, list) else [img_size, img_size]# 根据img=(1, 3, 32, 32)的浮点计算量flops推算出640x640的图片的浮点计算量GFLOPs# 不直接计算640x640的图片的浮点计算量GFLOPs可能是为了高效性吧, 这样算可能速度更快fs = ', %.1f GFLOPs' % (flops * img_size[0] / stride * img_size[1] / stride)except (ImportError, Exception):fs = ''# 添加日志信息# Model Summary: 所有层数量, 模型总参数量, 需要求梯度的总参数量, img_size大小的model的浮点计算量GFLOPslogger.info(f"Model Summary: {
      len(list(model.modules()))} layers, {
      n_p} parameters, {
      n_g} gradients{
      fs}")

7、initialize_weights

这个函数是用来初始化模型权重的，会在yolo.py的Model类中的init函数被调用，如下：
在这里插入图片描述
initialize_weights函数代码：

def initialize_weights(model):"""在yolo.py的Model类中的init函数被调用用于初始化模型权重"""for m in model.modules():t = type(m)if t is nn.Conv2d:   # 如果是二维卷积就跳过 或者使用何凯明初始化pass  # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')elif t is nn.BatchNorm2d:  # 如果是BN层 就设置相关参数: eps和momentumm.eps = 1e-3m.momentum = 0.03elif t in [nn.Hardswish, nn.LeakyReLU, nn.ReLU, nn.ReLU6]:# 如果是这几类激活函数 inplace插值就赋为True# inplace = True 指进行原地操作 对于上层网络传递下来的tensor直接进行修改 不需要另外赋值变量# 这样可以节省运算内存，不用多储存变量m.inplace = True

8、find_modules（没用到）

这个函数是用于找到模型model中类型是mclass的层结构的索引，但是这个函数没用到，所以大可不看。

find_modules函数代码：

def find_modules(model, mclass=nn.Conv2d):"""用于找到模型model中类型是mclass的层结构的索引 Finds layer indices matching module class 'mclass':params model: 模型:params mclass: 层结构类型 默认nn.Conv2d"""return [i for i, m in enumerate(model.module_list) if isinstance(m, mclass)]

9、sparsity、prune

这两个函数是用于模型剪枝的，虽然作者并没有在它的代码中使用，但我们自己是可以在test.py和detect.py中进行模型剪枝实验的。

9.1、sparsity

这个函数是用来计算模型的稀疏程度sparsity，返回模型整体的稀疏性。会被prune剪枝函数中被调用。

sparsity函数代码：

def sparsity(model):"""用于求模型model的稀疏程度sparsity   Return global model sparsity"""# 初始化模型的总参数个数a(前向+反向) 模型参数中值为0的参数个数ba, b = 0., 0.# model.parameters()返回模型model的参数 返回一个生成器 需要用for循环或者next()来获取参数# for循环取出每一层的前向传播和反向传播的参数for p in model.parameters():a += p.numel()b += (p == 0).sum()# b / a 即可以反应模型的稀疏程度return b / a

9.2、prune

这个函数是用于对模型进行剪枝的，通过调用sparsity函数计算模型的稀疏性进行剪枝，以增加模型的稀疏性。关于函数的用法作者并没有写在它的代码中，不过并不妨碍我们自己实验它，具体用法可以查看下面标注的链接。它主要是可以用在两个地方：

1、test.py:
在这里插入图片描述 2、detect.py

prune函数代码：

def prune(model, amount=0.3):"""可以用于test.py和detect.py中进行模型剪枝对模型model进行剪枝操作 以增加模型的稀疏性 使用prune工具将参数稀疏化https://github.com/ultralytics/yolov5/issues/304:params model: 模型:params amount: 随机裁剪(总参数量 x amount)数量的参数"""import torch.nn.utils.prune as prune  # 导入用于剪枝的工具包print('Pruning model... ', end='')# 模型的迭代器 返回的是所有模块的迭代器 同时产生模块的名称(name)以及模块本身(m)for name, m in model.named_modules():if isinstance(m, nn.Conv2d):# 对当前层结构m, 随机裁剪(总参数量 x amount)数量的权重(weight)参数prune.l1_unstructured(m, name='weight', amount=amount)  # prune# 彻底移除被裁剪的的权重参数prune.remove(m, 'weight')  # make permanent# 输出模型的稀疏度 调用sparsity函数计算当前模型的稀疏度print(' %.3g global sparsity' % sparsity(model))

10、fuse_conv_and_bn

这个函数其实是一个增强方式，思想就是：在推理测试阶段，将卷积层和BN层进行融合，以加速推理。具体的证明可以看看下面代码标注的几篇博客/知乎，讲的很清楚，公式证明也不难。这个函数主要会在yolo.py中Model类的fuse函数中调用，如：
在这里插入图片描述
这样在推理阶段，只需要执行 model = model.fuse() ，即可完成融合操作，非常方便简单。

fuse_conv_and_bn函数代码：

def fuse_conv_and_bn(conv, bn):"""在yolo.py中Model类的fuse函数中调用融合卷积层和BN层(测试推理使用) Fuse convolution and batchnorm layers方法: 卷积层还是正常定义, 但是卷积层的参数w,b要改变 通过只改变卷积参数, 达到CONV+BN的效果w = w_bn * w_conv b = w_bn * b_conv + b_bn (可以证明)https://tehnokv.com/posts/fusing-batchnorm-and-conv/https://github.com/ultralytics/yolov3/issues/807https://zhuanlan.zhihu.com/p/94138640:params conv: torch支持的卷积层:params bn: torch支持的bn层"""fusedconv = nn.Conv2d(conv.in_channels,conv.out_channels,kernel_size=conv.kernel_size,stride=conv.stride,padding=conv.padding,groups=conv.groups,bias=True).requires_grad_(False).to(conv.weight.device)# prepare filters# w_conv: 卷积层的w参数 直接clone conv的weight即可w_conv = conv.weight.clone().view(conv.out_channels, -1)# w_bn: bn层的w参数(可以自己推到公式) torch.diag: 返回一个以input为对角线元素的2D/1D 方阵/张量?w_bn = torch.diag(bn.weight.div(torch.sqrt(bn.eps + bn.running_var)))# w = w_bn * w_conv torch.mm: 对两个矩阵相乘fusedconv.weight.copy_(torch.mm(w_bn, w_conv).view(fusedconv.weight.shape))# prepare spatial bias# b_conv: 卷积层的b参数 如果不为None就直接读取conv.bias即可b_conv = torch.zeros(conv.weight.size(0), device=conv.weight.device) if conv.bias is None else conv.bias# b_bn: bn层的b参数(可以自己推到公式)b_bn = bn.bias - bn.weight.mul(bn.running_mean).div(torch.sqrt(bn.running_var + bn.eps))# b = w_bn * b_conv + b_bn (w_bn not forgot)fusedconv.bias.copy_(torch.mm(w_bn, b_conv.reshape(-1, 1)).reshape(-1) + b_bn)return fusedconv

11、load_classifier

这个函数用于在检测后可能需要进行二次分类，比如一个场景：在一张图片中有猫和车牌，我们用yolo将车牌和猫分别做好了分类和定位，但是如果我接下来还要分出是什么猫或者车牌数字是什么，这些都是二次任务，需要再写函数才能实现。这个函数会在detect.py中调用：
在这里插入图片描述
load_classifier函数代码：

def load_classifier(name='resnet101', n=2):"""在detect.py中调用用于检测结束后可能需要第二次分类 直接修改torchvision中的预训练模型的分类类别即可:params name: 分类模型名字 默认resnet101:params n: 分类模型的分类类别数 需要在加载了预训练模型后将model的最后一层的类别数改为n"""## 加载torchvision中已经写好的pretrained模型 reshape为n类输出model = torchvision.models.__dict__[name](pretrained=True)# ResNet model properties# input_size = [3, 224, 224]# input_space = 'RGB'# input_range = [0, 1]# mean = [0.485, 0.456, 0.406]# std = [0.229, 0.224, 0.225]# 将加载的预训练模型的最后一层的分类类别数改为n Reshape output to n classes# 总体的过程 = 将fc层的权重和偏置清0 + 修改类别个数为nfilters = model.fc.weight.shape[1]model.fc.bias = nn.Parameter(torch.zeros(n), requires_grad=True)model.fc.weight = nn.Parameter(torch.zeros(n, filters), requires_grad=True)model.fc.out_features = n# 返回reshape后的模型 进行二次分类return model

这里只是简单的改了一下分类类别，实际上如果改了这个的话，肯定还需要修改loss函数优化它的，我就不往下延申了。

12、scale_img

这个函数是用于对图片进行缩放操作。第一时间比较奇怪，这种数据增强的操作怎么会写在这里呢？不是应该写在datasets.py中吗？其实这里的scale_img是专门用于yolo.py文件中Model类的forward_augment函数中的。为什么模型部分需要对输入图片进行scale shape呢？作者有提到，这是一种Test Time Augmentation(TTA)操作，就是在测试时也使用数据增强，也算是一种增强的方式吧，但是这种方式感觉用的并不多，需要多多实验。
在这里插入图片描述
scale_img函数代码：

def scale_img(img, ratio=1.0, same_shape=False, gs=32):"""用于yolo.py文件中Model类的forward_augment函数中实现对图片的缩放操作:params img: 原图:params ratio: 缩放比例 默认=1.0 原图:params same_shape: 缩放之后尺寸是否是要求的大小(必须是gs=32的倍数):params gs: 最大的下采样率 32 所以缩放后的图片的shape必须是gs=32的倍数"""# img(16,3,256,416)# scales img(bs,3,y,x) by ratio constrained to gs-multipleif ratio == 1.0:   # 如果缩放比例ratio为1.0 直接返回原图return img`在这里插入代码片`else:  # 如果缩放比例ratio不为1.0 则开始根据缩放比例ratio进行缩放# h, w: 原图的高和宽h, w = img.shape[2:]# s: 放缩后图片的新尺寸 new sizes = (int(h * ratio), int(w * ratio))# 直接使用torch自带的F.interpolate(上采样下采样函数)插值函数进行resize# F.interpolate: 可以给定size或者scale_factor来进行上下采样# mode='bilinear': 双线性插值 nearest:最近邻# align_corner: 是否对齐 input 和 output 的角点像素(corner pixels)img = F.interpolate(img, size=s, mode='bilinear', align_corners=False)if not same_shape:# 缩放之后要是尺寸和要求的大小(必须是gs=32的倍数)不同 再对其不相交的部分进行pad# 而pad的值就是imagenet的mean# Math.ceil(): 向上取整 这里除以gs向上取整再乘以gs是为了保证h、w都是gs的倍数h, w = [math.ceil(x * ratio / gs) * gs for x in (h, w)]# pad img shape to gs的倍数 填充值为 imagenet meanreturn F.pad(img, [0, w - s[1], 0, h - s[0]], value=0.447)

13、de_parallel

这个函数用于判断单卡还是多卡(能否并行) 多卡返回model.module 单卡返回model（具体原因看下面代码注释）。这个函数用在train.py中, 用于加载和保存模型(参数)，如：
在这里插入图片描述

de_parallel函数代码：

def de_parallel(model):"""用在train.py中, 用于加载和保存模型(参数)判断单卡还是多卡(能否并行) 多卡返回model.module 单卡返回model"""# De-parallelize a model: returns single-GPU model if model is of type DP or DDP# 如果model支持并行(多卡)就返回model.module 不支持并行就返回model# 用在tain中保存模型 因为多卡训练的时候直接用model.state_dict()进行保存的模型, 每个层参数的名称前面会加上module,# 这时候再用单卡(gpu) model_dict加载model.state_dict()参数时会出现名称不匹配的情况,# 因此多卡保存模型时注意使用model.module.state_dict() 即返回model.module 单卡返回model即可return model.module if is_parallel(model) else model

14、is_parallel、copy_attr、ModelEMA类

14.1、is_parallel

这个函数用于判断模型是否支持并行，在ModelEMA类中调用。

is_parallel函数代码：

def is_parallel(model):"""在ModelEMA类中调用用于判断模型是否支持并行 Returns True if model is of type DP or DDP"""return type(model) in (nn.parallel.DataParallel, nn.parallel.DistributedDataParallel)

14.2、copy_attr

这个函数可以将b对象的属性值赋值给a对象（key键必须相同，然后才能赋值），常用于模型赋值，如 model -> ema（ModelEMA类就是这么干的）。这个函数会在两个地方用到，一个是ModelEMA类中，另一个是yolo.py文件中的Model类的autoshape函数如：
在这里插入图片描述
copy_attr函数代码：

def copy_attr(a, b, include=(), exclude=()):"""在ModelEMA函数和yolo.py中Model类的autoshape函数中调用复制b的属性(这个属性必须在include中而不在exclude中)给a:params a: 对象a(待赋值):params b: 对象b(赋值):params include: 可以赋值的属性:params exclude: 不能赋值的属性"""# Copy attributes from b to a, options to only include [...] and to exclude [...]# __dict__返回一个类的实例的属性和对应取值的字典for k, v in b.__dict__.items():if (len(include) and k not in include) or k.startswith('_') or k in exclude:continueelse:# 将对象b的属性k赋值给asetattr(a, k, v)

14.3、class ModelEMA

最后一个类 ModelEMA的定义，这个类非常重要，是一种非常常见的提高模型鲁棒性的增强trock，被广泛的使用。全名：Model Exponential Moving Average 模型的指数加权平均方法，是一种给予近期数据更高权重的平均方法，利用滑动平均的参数来提高模型在测试数据上的健壮性/鲁棒性，一般用于测试集。

这个类的定义用到了很多的数学知识，包括公式的推导，不是我这里几句话就能讲明白的，下面代码的注释中有我收集的几个吴恩达视频、博客、知乎中关于这个方法的原理的解释，仔细看下再会过头来看这段代码应该就差不多了。

这个函数被用在train.py中的test.run（测试）阶段：
在这里插入图片描述
更新参数也需要更新ema：

class ModelEMA类代码：

class ModelEMA:"""用在train.py中的test.run（测试）阶段模型的指数加权平均方法(Model Exponential Moving Average)是一种给予近期数据更高权重的平均方法 利用滑动平均的参数来提高模型在测试数据上的健壮性/鲁棒性 一般用于测试集https://www.bilibili.com/video/BV1FT4y1E74V?p=63https://www.cnblogs.com/wuliytTaotao/p/9479958.htmlhttps://zhuanlan.zhihu.com/p/68748778https://zhuanlan.zhihu.com/p/32335746https://github.com/ultralytics/yolov5/issues/608https://github.com/rwightman/pytorch-image-models/blob/master/timm/utils/model_ema.py"""def __init__(self, model, decay=0.9999, updates=0):"""train.pymodel:decay: 衰减函数参数默认0.9999 考虑过去10000次的真实值updates: ema更新次数"""# 创建ema模型 Create EMAself.ema = deepcopy(model.module if is_parallel(model) else model).eval()  # FP32 EMA# if next(model.parameters()).device.type != 'cpu':# self.ema.half() # FP16 EMAself.updates = updates  # ema更新次数 number of EMA updates# self.decay: 衰减函数 输入变量为x decay exponential ramp (to help early epochs)self.decay = lambda x: decay * (1 - math.exp(-x / 2000))# 所有参数取消设置梯度(测试 model.val)for p in self.ema.parameters():p.requires_grad_(False)def update(self, model):# 更新ema的参数 Update EMA parameterswith torch.no_grad():self.updates += 1  # ema更新次数 + 1d = self.decay(self.updates)  # 随着更新次数 更新参数贝塔(d)# msd: 模型配置的字典 model state_dict msd中的数据保持不变 用于训练msd = model.module.state_dict() if is_parallel(model) else model.state_dict()# 遍历模型配置字典 如: k=linear.bias v=[0.32, 0.25] ema中的数据发生改变 用于测试for k, v in self.ema.state_dict().items():# 这里得到的v: 预测值if v.dtype.is_floating_point:v *= d    # 公式左边 decay * shadow_variable# .detach() 使对应的Variables与网络隔开而不参与梯度更新v += (1. - d) * msd[k].detach()  # 公式右边 (1?decay) * variabledef update_attr(self, model, include=(), exclude=('process_group', 'reducer')):# 调用上面的copy_attr函数 从model中复制相关属性值到self.ema中copy_attr(self.ema, model, include, exclude)

总结

好了，torch_utils.py文件就介绍到这里了。如果有看注释还是不懂的函数建议百度查一下，有查的这个过程会印象更深一点。如果是逻辑上的问题建议debug调试一下。

写于2021-07-29晚20：00

【YOLOV5-5.x 源码解读】torch_utils.py

目录

前言