Pytorch扫盲 - 安装、训练、测试、可视化、网络结构、finetune、loss_综合

文章目录

安装
使用
- 使用 GPU / CPU
数据
- 独热编码
- pytorch tensor
模型加载
- 模型保存和加载
- 加载到 CPU / GPU
训练
- - 注意这几步的顺序：
测试
- 按单图片测试
- - 图片预处理
  - 测试
- 使用dataloader 按batch测试
- - 生成 testloder
  - 测试
可视化
- - 查看模型结构
  - 打印模型参数：
finetune
- - 只给部分网络层加载权重参数
网络
- - 简单的multi-task网络
损失函数
- - 交叉熵
  - loss 加总回传
  - 训练时增大损失

安装

pytorch的安装：

安装miniconda：
进入虚拟环境：

$ conda activate xxx

不同cuda版本：

针对cuda11:
使用pip安装。在这里下载对应版本的torch和torchvision的whl文件（建议不要下载py36的，有bug）。然后pip instal就好了。
针对cuda10:
安装pytorch（不要在进入虚拟环境之后用conda安装，可能会报错）：

$ pip install torch torchvision

针对cuda-9.0，参考：

$ conda create -n xxxpj python=3.6 theano pygpu pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=9.0或者（注意：以下命令会自动安装上适配的最新版python，如pyton3.7.4）：#参考：https://blog.csdn.net/qxqxqzzz/article/details/103538647
$ conda install pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=9.0 -c pytorch
或者（注意：以下命令会自动安装上适配的最新版python，如pyton3.7.4）：
$ conda install pytorch torchvision cudatoolkit=9.0 -c pytorch

pytorch下载太慢？手动解决：(注意对应的python版本)

去这里(https://anaconda.org/pytorch/pytorch/files)手动下载你要的版本
如果浏览器支持搭梯子，请直接在浏览器内点击下载，速度最快。
我下载的是：https://anaconda.org/pytorch/pytorch/1.1.0/download/linux-64/pytorch-1.1.0-py3.6_cuda9.0.176_cudnn7.5.1_0.tar.bz2
# 然后使用conda本地安装
$ conda install --use-local /home/xxx/pytorch-1.1.0-py3.6_cuda9.0.176_cudnn7.5.1_0.tar.bz2

参考：https://zhuanlan.zhihu.com/p/35740229
https://zhuanlan.zhihu.com/p/73741240

使用

使用 GPU / CPU

查看pytorch调用的是哪个版本的cuda

>>> torch.version.cuda
'10.1.243'

查看有没有GPU，什么型号。或者看GPU配置是否正确

>>> import torch
# 看有没有显卡
>>> torch.cuda.is_available()
# 看有几块显卡
>>> torch.cuda.device_count()
# 看第一块显卡的型号
>>> torch.cuda.get_device_name(0)

设置使用CPU、GPU

import os
import torch
# 使用 CPU
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
# 使用 GPU
from utee import misc
args.gpu = misc.auto_select_gpu(utility_bound=0, num_gpu=args.ngpu, selected_gpus=args.gpu)
# 使用某个特定的gpu，choose which gpu to use
os.environ['CUDA_VISIBLE_DEVICES']='3' # 只使用这一句可能不起作用，建议两句都用上
torch.cuda.set_device(3) # 这句也需要同时加上，尤其是在使用fastai的时候
或者参考这里：https://blog.csdn.net/qxqxqzzz/article/details/107720675

自动选择合适的GPU，auto_select_gpu() 函数:

def auto_select_gpu(mem_bound=500, utility_bound=0, gpus=(0, 1, 2, 3, 4, 5, 6, 7), num_gpu=1, selected_gpus=None):# mem_bound: 每块GPU当前显存占用限制, 超过这个限制认为该GPU当前不可用# utility_bound: 每块GPU当前浮动利用率限制, 超过这个限制认为该GPU当前不可用import sysimport osimport subprocessimport reimport timeimport numpy as npif 'CUDA_VISIBLE_DEVCIES' in os.environ:sys.exit(0)if selected_gpus is None:mem_trace = [] # 通过5次采样,查看显存占用情况utility_trace = [] # 通过5次采样,查看浮动的GPU利用率(GPU占用情况)# GPU占用和显存占用不同,参考:https://blog.csdn.net/Bruce_0712/article/details/63683787for i in range(5): # sample 5 timesinfo = subprocess.check_output('nvidia-smi', shell=True).decode('utf-8')mem = [int(s[:-5]) for s in re.compile('\d+MiB\s/').findall(info)]utility = [int(re.compile('\d+').findall(s)[0]) for s in re.compile('\d+%\s+Default').findall(info)]mem_trace.append(mem)utility_trace.append(utility)time.sleep(0.1)mem = np.mean(mem_trace, axis=0) # axis=0按列求均值utility = np.mean(utility_trace, axis=0)assert(len(mem) == len(utility))nGPU = len(utility)ideal_gpus = [i for i in range(nGPU) if mem[i] <= mem_bound and utility[i] <= utility_bound and i in gpus]if len(ideal_gpus) < num_gpu:print("No sufficient resource, available: {}, require {} gpu".format(ideal_gpus, num_gpu))sys.exit(0)else:selected_gpus = list(map(str, ideal_gpus[:num_gpu]))else:selected_gpus = selected_gpus.split(',')print("Setting GPU: {}".format(selected_gpus))os.environ['CUDA_VISIBLE_DEVICES'] = ','.join(selected_gpus)return selected_gpus

同时使用多个特定的GPU：目前还没有找到很好的方法，只设置CUDA_VISIBLE_DEVICES 可能不管用。
其他参考：https://oldpan.me/archives/pytorch-to-use-multiple-gpus
数据并行：https://juejin.im/post/5d555d55f265da039f12b1c0

数据

独热编码

pytorch multi-label 多标签学习 ground truth 标签读独热编码 one-hot

>>> t = torch.tensor([[1],[0],[1]])
>>> torch.zeros(3, 2).scatter_(1, t, 1)
tensor([[0., 1.],[1., 0.],[0., 1.]])

pytorch tensor

类似内容：numpy扫盲、Python多维数组和矩阵
看清结构：

多维行向量：看起来可以直接遍历每个元素（我把它叫做看起来多维，其实只有一维），其实直接遍历只能获得一个元素（tensor中的元素有多个，但属于一个整体）

# 这样定义了一行5列的行向量tensor。torch.Size([1, 5])
>>> a = [[1,2,3,4,5]]
>>> torch.tensor(a)
>>> a.shape
torch.Size([1, 5])# 行向量长度为1
>>> a = torch.tensor([[1,2,3,4,5]]) # 一行5列, 
>>> len(a)
1
# 第1层里面只包着一个元素
>>> for i,j in enumerate(a):
...     print(i,j)
...
0 tensor([1, 2, 3, 4, 5])
>>> a[0]
tensor([1, 2, 3, 4, 5])# 如果一定要遍历一下
>>> a[:,1]
tensor([2])
# 或者转成列向量之后操作
>>> a.reshape((5,1))[0]
tensor([1])
# 通过转置转成列向量
>>> a.t()[0]
tensor([1])

列向量 可以直接遍历（我把它叫做真正的多维）：每个元素都属于独立的一个维度（tensor有两层）

# 这样定义了5行一列的tensor。列向量
>>> b = [[1], [2], [3], [4], [5]]
>>> torch.tensor(b)
>>> b.shape
torch.Size([5, 1])
# 列向量长度为元素个数
>>> len(b)
5

3）按元素是否相等计算准确率（两个shape为（n,1）的tensor）：转置与不转置的区别
行向量相等和列向量相等

模型加载

模型保存和加载

# 保存和加载整个模型 
torch.save(model_object, 'model.pth')  
model = torch.load('model.pth')  # 仅保存和加载模型参数 
torch.save(model_object.state_dict(), 'params.pth')  
model_object.load_state_dict(torch.load('params.pth'))

加载到 CPU / GPU

将模型加载到CPU上：

device = 'cpu'
model = model.to(device)
backbone_pth = os.path.join(args.modelRt, 'backbone-%s.pth' % args.iterNum)
model.load_state_dict(torch.load(backbone_pth, map_location = lambda storage, loc: storage))
model.eval() # 用于测试

模型加载到GPU上：

rank=0
model = model.to(rank)
backbone_pth = os.path.join(args.modelRt, 'backbone-%s.pth' % args.iterNum)
model.load_state_dict(torch.load(backbone_pth, map_location=torch.device(rank)))

在CPU加载和GPU加载模型权重参数之间切换

训练

网络层定义好之后之后可以直接使用（有默认初始化）。初始化不是必须。https://blog.csdn.net/luo3300612/article/details/97675312
训练和测试过程中的acc和loss计算:
https://www.cnblogs.com/yqpy/p/11497199.htm
按batch累计而不是单batch计算!!!
print的loss是按batch进行计算，而不是按样本个数进行计算

图片分类的基本训练代码片段:

for epoch in range(2):  # loop over the dataset multiple timesrunning_loss = 0.0for i, data in enumerate(trainloader, 0):# get the inputs; data is a list of [inputs, labels]inputs, labels = data# zero the parameter gradientsoptimizer.zero_grad()# forward + backward + optimizeoutputs = net(inputs)loss = criterion(outputs, labels)loss.backward()optimizer.step()# print statisticsrunning_loss += loss.item()if i % 2000 == 1999:    # print every 2000 mini-batchesprint('[%d, %5d] loss: %.3f' %(epoch + 1, i + 1, running_loss / 2000))running_loss = 0.0print('Finished Training')

注意这几步的顺序：

optimizer.zero_grad() -> loss.backward() -> optimizer.step()

一些问题：

每次forward都会产生计算图？多次forward会产生多个计算图占用内存？
loss.backward会释放计算图从而释放内存？但不释放梯度？
optimizer.step() 和 optimizer.zero_grad() 会清空梯度？
所以累加梯度但又不占内存的方法就是多次forward和backward，但是隔好多步再去step和zero_grad?

测试

在测试过程中要注意对图片进行与训练过程一样的预处理操作，否则结果不对

按单图片测试

图片预处理

# load image，具体预处理过程请参考模型的训练代码
img = cv2.imread('xxx/samples/xxx.jpg') # (height, width, 3)
img = img[:,:,::-1] # bgr to rgb
w,h = img.shape[1], img.shape[0]# preprocessing: resize 放大图片，然后再做固定大小(input_size)的 center crop，最后做归一化
## resize
input_size = 224
expand_size = int(input_size/0.875)
## equals to transforms.Resize(int), resize short side to int, keep aspect ratio
if w >= h:ratio = w / hw_ = expand_size * ratioh_ = expand_size
else:ratio = h / ww_ = expand_sizeh_ = expand_size * ratio
w_, h_ = int(w_), int(h_)
img = cv2.resize(img, (w_, h_)) # 注意resize这里的dsize顺序是 宽、高：https://pythonexamples.org/python-opencv-cv2-resize-image/ ## center square crop, equals to transforms.CenterCrop(int)
w, h = img.shape[1], img.shape[0] # 放大后的图片
midx, midy = int(w/2), int(h/2)
cropx, cropy = int(input_size/2),int(input_size/2)
img = img[midy-cropy:midy+cropy, midx-cropx,midx+cropx]## normalize，先转成torch tensor方便操作
mean = torch.tensor([0.485*255,0.456*255,0.406*255]).view(1,3,1,1)
std = torch.tensor([0.229*255,0.224*255,0.225*255]).view(1,3,1,1)
img_batch = torch.from_numpy(img).float().unsqueeze(0) # 'float32' and expand dims
img_batch = img_batch.sub_(mean).div_(std)

测试

# load model
state = torch.load('xxx/checkpoint.pth.tar') # 加载的是权重参数，不是网络结构+参数
model = Model()
model = torch.nn.DataParallel(model).cuda()
model.load_state_dict(state['state_dict']) # 这句前面不要赋值# prediction
with torch.no_grad():pred = model(img_batch)maxk = 1_,pred = pred.topk(maxk,1,True,True)pred = pred.t()print(pred.item()) # get the value from tensor

使用dataloader 按batch测试

生成 testloder

import torchvision.transforms as transforms
import torchvision.datasets as datasets# 相同的preprocessing，用 transforms或PIL实现
## 参考：https://github.com/d-li14/mobilenetv2.pytorch/blob/master/utils/dataloaders.py
def fast_collate(batch):imgs = [img[0] for img in batch]targets = torch.tensor([target[1] for target in batch], dtype=torch.int64)w = imgs[0].size[0]h = imgs[0].size[1]tensor = torch.zeros( (len(imgs), 3, h, w), dtype=torch.uint8 )for i, img in enumerate(imgs):nump_array = np.asarray(img, dtype=np.uint8)tens = torch.from_numpy(nump_array)if(nump_array.ndim < 3):nump_array = np.expand_dims(nump_array, axis=-1)nump_array = np.rollaxis(nump_array, 2)tensor[i] += torch.from_numpy(nump_array)return tensor, targetstestdir = 'xxx/samples'
testset = datasets.ImageFolder(testdir,transforms.Compose([transforms.Resize(int(input_size/0.875)),transforms.CenterCrop(input_size),]))
testloader = torch.utils.data.DataLoader(testset,sampler = None,batch_size = 1, shuffle = False,  # 为了简化，每个batch仅包含一个元素num_workers = 5, work_init_fn = None, pin_memory = True,collate_fn = fast_collate)

测试

# prediction
with torch.no_grad():for data in testloader:imgs, labes = datalabels = labels.cuda()imgs = imgs.float().sub_(mean).div_(std)maxk = 1pred = model(imgs)_,pred = pred.topk(maxk,1,True,True).t()correct = pred.eq(labels.view(1,-1).expand_as(pred))print(labels.item(),pred.item(),correct.item())

另一种按batch的基本测试代码片段：

net = torch.load('model.pth')  
correct = 0
total = 0
with torch.no_grad():for data in testloader:images, labels = dataoutputs = net(images)_, predicted = torch.max(outputs.data, 1)total += labels.size(0)correct += (predicted == labels).sum().item()print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / total))

可视化

查看模型结构

    # visualize networkfrom torchviz import make_dotfrom arcface import model# 注意这里要使用原声的网络结构定义类，不要事先加载权重参数，否则会得到空白的PDF结果net = model.Backbone(num_layers=50, drop_ratio=0.6)x = torch.rand(8, 3, 112, 112)y = net(x)#　模型的输出如果是个ｌｉｓｔ，需要将ｌｉｓｔ中的各元素拼接在一起才能可视化for i in range(len(y)):if i == 0: c = torch.cat((y[0], y[1]), 1)elif i >= 2 and i <= len(y)-2:c = torch.cat((c, y[i+1]), 1)g = make_dot(c)g.render('net_arch', view=False)  # 会自动保存为一个 espnet.pdf，第二个参数为True,则会自动打开该PDF文件，为False则不打开

https://blog.csdn.net/qq_27825451/article/details/96856217

打印模型参数：

pytorch中state_dict()和named_parameters()的差别

state_dict 包含所有参数
named_parameters 只有可训练更新的参数

finetune

只给部分网络层加载权重参数

# only load the weights in arc face original model weights file, ignore the multi-task branches
pretrained_dict = torch.load(save_path/'model_{}'.format(fixed_str))
# print('Loaded ArcFace weights')
MTL_dict = self.model.state_dict()
# 1. filter out unnecessary keys
arcface_dict = {
    k: v for k, v in pretrained_dict.items() if k in MTL_dict}
# 2. overwrite entries in the existing state dict
MTL_dict.update(arcface_dict)
self.model.load_state_dict(MTL_dict)
# print('Loaded to MTL model')

https://blog.csdn.net/LXX516/article/details/80124768

网络

简单的multi-task网络

class FC(nn.Module):def __init__(self, bitsPerAttr=1, ):super(FC, self).__init__()self.layers = nn.Sequential(nn.Linear(1024, 512),nn.ReLU(),nn.Linear(512, bitsPerAttr),)def forward(self, x):output = self.layers(x)return outputclass MTLnet(nn.Module):def __init__(self):super(MTLnet, self).__init__()self.sharedlayer = nn.Sequential(nn.Conv2d(3, 2, kernel_size=3, padding=1),nn.BatchNorm2d(2),nn.ReLU(),nn.Linear(112, 64),nn.ReLU(),nn.Dropout())self.tower = nn.Sequential(nn.Dropout(),nn.Linear(256 * 6 * 6, 32),nn.ReLU(),nn.Linear(32, 2),#nn.AdaptiveAvgPool2d((1, 1)),)self.towers = nn.ModuleList([FC() for _ in range(40)])def forward(self, x):h_shared = self.sharedlayer(x)out = [tower(h_shared) for tower in self.towers]return out

https://github.com/hosseinshn/Basic-Multi-task-Learning/blob/master/MTL-Pytorch.ipynb

损失函数

交叉熵

pytorch使用交叉熵损失函数时，target不用做onehot，只需要指定一个整数就可以。

args.gpu = ‘0’
criterion = nn.CrossEntropyLoss().cuda(args.gpu)

loss 加总回传

对不同网络层的输出求的loss可以加总之后，进行统一的回传

训练时增大损失

在损失函数前面乘以负号就可以了

注：
https://pytorch.org/tutorials/beginner/pytorch_with_examples.html
https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html