openvino使用（一）转换并量化（INT8）分类网络模型_综合

0. 前言

在上一篇博文中，我们讲述了如何在ubuntu18.04下安装了openvino工具。那么本文就在之前安装配置好的环境中来简单使用下openvino工具。之前有做过使用pytorch搭建resnet网络的教程，有兴趣的可以看下Resnet网络结构详解与模型的搭建，本示例就以将pytorch训练的Resnet34为例进行讲解，具体流程如下：

将Pytorch模型转为ONNX格式（这个不讲，直接参考Pytorch官网的教程）
将ONNX格式转为openvino的IR格式（float32）
将IR模型（float32）量化成（int8）
使用openvino的推理引擎调用量化后的IR模型进行推理预测

1. 在pycharm中使用openvino环境

如果你不用pycharm，而是直接通过终端调用python脚本，那么可直接跳过该步骤。
安装配置好openvino后，打开终端时系统会自动载入openvino的相关资源（因为我们在安装openvino时将载入资源的指令写入到当前用户的~/.bashrc文件中了，所以在终端中进入python环境导入openvino包是正常的），但在pycharm中（直接通过图标启动的）导入openvino包时，会报错找不到openvino（因为这里的pycharm没有载入当前用户~/.bashrc中的环境）。解决这个问题的方法有很多，我这里提供一个笨方法。
由于直接使用图标启动pycharm时不会载入当前用户~/.bashrc中的环境，但如果打开终端（此时已载入openvino的环境了），通过终端启动pycharm就能够正常使用openvino包了。
如果不知道pycharm的启动文件在哪，可以通过以下指令查找：

sudo find / -name pycharm.sh

找到后进入pycharm.sh所在文件夹，执行该启动脚本：

./pycharm.sh

这样就可以在pycharm中调试含有openvino包的代码了。

2. 将ONNX模型转为IR格式

将onnx模型转成IR格式（建议转FP32，FP16除了模型小一点，没任何提速），这里使用的ONNX模型时自己通过pytorch搭建的resnet34并转成ONNX格式：
首先进入<INSTALL_DIR>/deployment_tools/model_optimizer文件夹：

cd ~/intel/openvino/deployment_tools/model_optimizer

然后使用mo.py文件进行转换，如果不将预处理方法写入网络，可使用以下指令：

python mo.py --input_model ~/my_project/resnet34.onnx --output_dir ~/openvino_samples/ --input_shape [1,3,224,224] --data_type FP32

下面是转换过程中，终端打印的信息：
translate onnx

如果要将预处理方法（这里仅指减mean，除以std的预处理）写入网络，可使用以下指令：

python mo.py --input_model ~/my_project/resnet34.onnx --output_dir ~/openvino_samples/ --input_shape [1,3,224,224] --data_type FP32 --mean_values [123.675,116.28,103.53] --scale_values [58.395,57.12,57.375]

其中--input_model为需要转换的onnx文件路径，--input_shape为指定输入图像的shape，--data_type为转换后的数据类型，--mean_values为图像预处理过程中减去的mean，--scale_values为图像预处理过程中除以的std，其他参数可参考官方文档。

3. 使用benchmark脚本进行性能测试

首先进入<INSTALL_DIR>/deployment_tools/tools/benchmark_tool文件夹：

cd ~/intel/openvino/deployment_tools/tools/benchmark_tool

然后安装下需要使用的python环境：

pip install -r requirements.txt

接着使用benchmark_app.py脚本进行测试：

python ./benchmark_app.py -m ~/openvino_samples/resnet34.xml -d CPU -api async -i ~/openvino_samples/tulip.jpg -progress true -b 1

这里有些参数需要注意下，-m是指定前面转换的IR文件的xml文件路径，-d是指定设备类型，-API是推理方式，同步或异步sync, async，-i是指定输入图像的路径，-progress是否显示进度条，-b指定输入的batch_size，还有很多参数可以通过以下指令查询：python ./benchmark_app.py -h

最后在终端会输出如下类似内容：
采用异步推理的输出，采用异步的方式能够充分使用硬件资源，跑测试时8核的CPU，8个核全部跑满。通过以下输出可以看出，Latency较高，但Count也高。

[Step 10/11] Measuring performance (Start inference asyncronously, 4 inference requests using 4 streams for CPU, limits: 60000 ms duration)
Progress: |................................| 100%[Step 11/11] Dumping statistics report
Count:      3092 iterations
Duration:   60072.90 ms
Latency:    76.20 ms
Throughput: 51.47 FPS

采用同步推理的输出，采用同步的方式能够以最快的速度去推理，跑测试时8核的CPU，只有4个核全部跑满。通过以下输出可以看出，Latency较低，但Count也低。

[Step 10/11] Measuring performance (Start inference syncronously, limits: 60000 ms duration)
Progress: |................................| 100%[Step 11/11] Dumping statistics report
Count:      2640 iterations
Duration:   60003.31 ms
Latency:    21.21 ms
Throughput: 47.15 FPS

4. 使用INT8推理

在将模型部署至生产环境时，一般为了加速模型的推理速度，常常会将模型变量类型从float32转成int8（加速模型推理，但精度会有一定下降）。对于量化模型至int8，openvino的官方文档中给了两种量化方法。一种是快速简单的fast DefaultQuantization，一种是更加精确的precise AccuracyAwareQuantization方法。

4.1 安装Post-Training Optimization工具

在使用openvino的量化工具前需要先配置下环境。

在～/intel/openvino/deployment_tools/open_model_zoo/tools/accuracy_checker目录下执行以下命令：

python setup.py install

在～/intel/openvino/deployment_tools/tools/post_training_optimization_toolkit目录下执行以下命令：

python setup.py install

配置好环境后，在终端中就可以直接使用pot指令（在后面量化过程中会使用到）：

pot -c <path_to_config_file>

4.2 Annotation Converters标注转换

在转换之前，我们需要准备一下数据（在量化过程中会使用到）。对于自己的数据集首先转换成openvino支持的数据格式，这里使用imagenet数据集格式（公开常用的数据集格式openvino基本都支持），对于分类任务需要annotation.txt文件以及labels.txt共两个文件（后面测试发现不需要labels.txt文件也行）。

这里提供一个脚本，利用自己的数据集生成annotation.txt以及labels.txt文件，下面是已花分类数据集为例（五类花），数据的摆放目录如下：

├── dataset:： 存放数据集的根目录
│     ├── daisy：         该文件夹下存放类别为daisy的所有图片
│     ├── dandelion：     该文件夹下存放类别为dandelion的所有图片
│     ├── roses：         该文件夹下存放类别为roses的所有图片
│     ├── sunflowers：    该文件夹下存放类别为sunflowers的所有图片
│     └── tulips ：       该文件夹下存放类别为tulips的所有图片

然后使用我提供的python脚本来生成所需文件：

import os
import globimage_dir = "/your_dataset_dir"
assert os.path.exists(image_dir), "image dir does not exist..."img_list = glob.glob(os.path.join(image_dir, "*", "*.jpg"))
assert len(img_list) > 0, "No images(.jpg) were found in image dir..."classes_info = os.listdir(image_dir)
classes_info.sort()
classes_dict = {
    }# create label file
with open("my_labels.txt", "w") as lw:# 注意，没有背景时，index要从0开始for index, c in enumerate(classes_info, start=0):txt = "{}:{}".format(index, c)if index != len(classes_info):txt += "\n"lw.write(txt)classes_dict.update({
    c: str(index)})
print("create my_labels.txt successful...")# create annotation file
with open("my_annotation.txt", "w") as aw:for img in img_list:img_classes = classes_dict[img.split("/")[-2]]txt = "{} {}".format(img, img_classes)if index != len(img_list):txt += "\n"aw.write(txt)
print("create my_annotation.txt successful...")

接着使用convert_annotation工具将自己标注的数据集（之前转成imagenet格式）进行转换，其实只会在AccuracyAwareQuantization方法中使用到imagenet.pickle文件：

convert_annotation imagenet --annotation_file ./my_annotation.txt --labels_file ./my_labels.txt --has_background False -o new_annotations -a imagenet.pickle -m imagenet.json

下面是官方给的参数解释：
imagenet

4.3 使用`DefaultQuantization`方法

在准备好上述annotation.txt文件后，接下来配置使用fast DefaultQuantization方法中所需的json文件。在～/intel/openvino/deployment_tools/tools/post_training_optimization_toolkit/config文件夹中有一些官方提供的模板。这里我们参考default_quantization_template.json模板，并进行修改。

{
    /* Model parameters */"model": {
    "model_name": "resnet34", // Model name"model": "/home/wz/openvinotest/resnet34.xml", // Path to model (.xml format)"weights": "/home/wz/openvinotest/resnet34.bin" // Path to weights (.bin format)},/* Parameters of the engine used for model inference */"engine": {
    "config": "resnet34.yaml" // Path to Accuracy Checker config},/* Optimization hyperparameters */"compression": {
    "target_device": "CPU", // Target device, the specificity of which will be taken// into account during optimization"algorithms": [{
    "name": "DefaultQuantization", // Optimization algorithm name"params": {
    "preset": "performance", // Preset [performance, mixed, accuracy] which control the quantization// mode (symmetric, mixed (weights symmetric and activations asymmetric)// and fully asymmetric respectively)"stat_subset_size": 300  // Size of subset to calculate activations statistics that can be used// for quantization parameters calculation}}]}
}

在这个模板中，我们需要配置的参数有model_name（自定义量化模型的名称），model（之前转换好的*.xml文件），weights（之前转换好的*.bin文件）以及engine中的config（这里指向的是个yaml文件，下面会解释）。其他的参数基本按默认来就行了，具体每个参数官方都有解释。

接下来配置刚刚上面提到的yaml文件（engine->config），在～/intel/openvino/deployment_tools/tools/post_training_optimization_toolkit/configs/examples/accuracy_checker文件夹中也有官方给的一些模板，下面是我自己根据模板改的一个yaml。

models:- name: resnet34launchers:- framework: dlsdkdevice: CPUadapter: classificationdatasets:- name: classification_datasetdata_source: /home/wz/openvinotest/data_set/flower_dataannotation_conversion:converter: imagenet  annotation_file: /home/wz/openvinotest/my_annotation.txtreader: opencv_imread  # default settingpreprocessing:- type: resizesize: 256aspect_ratio_scale: greater- type: cropsize: 224- type: bgr_to_rgb  # bgr format in opencv - type: normalization# you may specify precomputed statistics manually or use precomputed values, such as ImageNet as wellmean: (123.675, 116.28, 103.53)std: (58.395, 57.12, 57.375)

其中name、data_source（感觉这个参数没有用，因为数据的路径都记录在annotation.txt文件中，但又不可缺）以及annotation_file（就是上面生成的annotation.txt文件）参数是需要我们配置的。
配置好相关文件后，使用pot工具调用我们上面配置好的json文件进行量化：

pot -c ./default_quantization_resnet34.json

4.4 使用`AccuracyAwareQuantization`方法

这里先贴张官方的介绍。我这里简单谈下我自己的看法（英语比较渣，有错误还请指教）。

首先AccuracyAwareQuantization方法会执行前面说到的DefaultQuantization方法去量化整个模型。
接着将量化前和量化后的模型在我们提供的验证集上进行比较，找出预测差异较大的样本。
如果精度下降程度超过我们的要求，将模型设置成混合精度的模式。
对所有层进行排序，排序的依据是根据量化前后导致精度下降了多少，将量化前后对精度影响较大的层排在前面。
将量化前后对精度影响最大的层还原回原始精度（这里我是使用float32转的，所以还原回float32），接着重新计算在所有验证集上的精度。
如果精度下降的程度达到我们的要求，就停止，否则接着对层排序，接着以上步骤。
首先配置下AccuracyAwareQuantization方法所需的json文件。在～/intel/openvino/deployment_tools/tools/post_training_optimization_toolkit/config文件夹中有一些官方提供的模板。这里我们参考accuracy_aware_quantization_template.json模板，并进行修改。

/* This configuration file is the fastest way to get started with the accuracy aware quantization algorithm. It contains only mandatory options with commonly used values. All other options can be considered as an advanced mode and requires deep knowledge of the quantization process. An overall description of all possible parameters can be found in the accuracy_aware_quantization_spec.json */{
    /* Model parameters */"model": {
    "model_name": "resnet34a", // Model name"model": "/home/wz/openvinotest/resnet34p.xml", // Path to model (.xml format)"weights": "/home/wz/openvinotest/resnet34p.bin" // Path to weights (.bin format)},/* Parameters of the engine used for model inference */"engine": {
    "config": "resnet34.yaml" // Path to Accuracy Checker config},/* Optimization hyperparameters */"compression": {
    "target_device": "CPU", // Target device, the specificity of which will be taken// into account during optimization"algorithms": [{
    "name": "AccuracyAwareQuantization", // Optimization algorithm name"params": {
    "preset": "performance", // Preset [performance, mixed, accuracy] which control the quantization// mode (symmetric, mixed (weights symmetric and activations asymmetric)// and fully asymmetric respectively)"stat_subset_size": 300, // Size of subset to calculate activations statistics that can be used// for quantization parameters calculation"maximal_drop": 0.01 // Maximum accuracy drop which has to be achieved after the quantization}}]}
}

下面是在配置json文件中指向的yaml文件，其中的annotation是我们之前转换的*.pickle文件：

models:- name: resnet34launchers:- framework: dlsdkdevice: CPUadapter: classificationdatasets:- name: classification_datasetdata_source: /home/wz/data_set/flower_dataannotation: /home/wz/openvinotest/imagenet.picklereader: opencv_imread  # default settingpreprocessing:- type: resizesize: 256aspect_ratio_scale: greater- type: cropsize: 224- type: bgr_to_rgb  # bgr format in opencv - type: normalization# you may specify precomputed statistics manually or use precomputed values, such as ImageNet as wellmean: (123.675, 116.28, 103.53)std: (58.395, 57.12, 57.375)metrics:- name: accuracy@top1type: accuracytop_k: 1- name: accuracy@top5type: accuracytop_k: 5

配置好相关文件后，使用pot工具调用我们上面配置好的json文件进行量化：

pot -c ./accuracy_aware_quantization_resnet34.json

下图是在量化过程中终端的输出，可以看到首先使用DefaultQuantization方法去量化，量化后acuuracy下降的很少，所以就直接停止了，没有去使用accuracy_aware_quantization去进一步调整。（由于使用的模型比较简单，所以量化过程比较顺利）
int8量化

4.5 benchmark测试（int8）

首先看下在float32下跑的benchmark指标：

[Step 10/11] Measuring performance (Start inference asyncronously, 4 inference requests using 4 streams for CPU, limits: 60000 ms duration)
Progress: |................................| 100%[Step 11/11] Dumping statistics report
Count:      2932 iterations
Duration:   60094.71 ms
Latency:    78.73 ms
Throughput: 48.79 FPS

在看下量化后（int8）的benchmark指标：

[Step 10/11] Measuring performance (Start inference asyncronously, 4 inference requests using 4 streams for CPU, limits: 60000 ms duration)
Progress: |................................| 100%[Step 11/11] Dumping statistics report
Count:      5820 iterations
Duration:   60036.01 ms
Latency:    40.42 ms
Throughput: 96.94 FPS

5. 使用python接口调用生成的IR模型

下面给出了一个自己根据demo改的基础调用IR模型方法，（这里以前面转换的resnet34分类模型为例）需要自己指定model_xml_path、model_bin_path、image_path以及class_json_path，其中image_path是存放图像的根目录，程序会自动寻找该目录下的所有*.jpg文件，并进行推理。class_json_path是对应分类任务的index与label的json文件，格式如下：

{
    "0": "daisy","1": "dandelion","2": "roses","3": "sunflowers","4": "tulips"
}

下面是我提供的python测试代码：

import sys
import cv2
import os
import glob
import json
import numpy as np
import logging as log
from openvino.inference_engine import IECoredef main():device = "CPU"model_xml_path = "./resnet34.xml"model_bin_path = "./resnet34.bin"image_path = "./"class_json_path = './class_indices.json'# set log formatlog.basicConfig(format="[ %(levelname)s ] %(message)s", level=log.INFO, stream=sys.stdout)assert os.path.exists(model_xml_path), ".xml file does not exist..."assert os.path.exists(model_bin_path), ".bin file does not exist..."# search *.jpg filesimage_list = glob.glob(os.path.join(image_path, "*.jpg"))assert len(image_list) > 0, "no image(.jpg) be found..."# load class labelassert os.path.exists(class_json_path), "class_json_path does not exist..."json_file = open(class_json_path, 'r')class_indict = json.load(json_file)# inference engineie = IECore()# read IRnet = ie.read_network(model=model_xml_path, weights=model_bin_path)# load modelexec_net = ie.load_network(network=net, device_name=device)# check supported layers for deviceif device == "CPU":supported_layers = ie.query_network(net, "CPU")not_supported_layers = [l for l in net.layers.keys() if l not in supported_layers]if len(not_supported_layers) > 0:log.error("device {} not support layers:\n {}".format(device,",".join(not_supported_layers)))log.error("Please try to specify cpu extensions library path in sample's command line parameters using -l ""or --cpu_extension command line argument")sys.exit(1)# get input and output nameinput_blob = next(iter(net.input_info))output_blob = next(iter(net.outputs))# set batch sizebatch_size = 1net.batch_size = batch_size# read and pre-process input imagesn, c, h, w = net.input_info[input_blob].input_data.shape# images = np.ndarray(shape=(n, c, h, w))# inference every imagefor i in range(len(image_list)):image = cv2.imread(image_list[i])if image.shape[:-1] != (h, w):image = cv2.resize(image, (w, h))# bgr(opencv default format) -> rgbimage = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)# pre-processimage = image / 255.image = (image - [0.485, 0.456, 0.406]) / [0.229, 0.224, 0.225]# change data from HWC to CHWimage = image.transpose((2, 0, 1))# add batch dimensionimage = np.expand_dims(image, axis=0)# start sync inferenceres = exec_net.infer(inputs={
    input_blob: image})prediction = np.squeeze(res[output_blob])# print(prediction)# np softmax processprediction -= np.max(prediction, keepdims=True)  # 为了稳定地计算softmax概率， 一般会减掉最大元素prediction = np.exp(prediction) / np.sum(np.exp(prediction), keepdims=True)class_index = np.argmax(prediction, axis=0)print("prediction: '{}'\nclass:{} probability:{}\n".format(image_list[i],class_indict[str(class_index)],np.around(prediction[class_index]), 2))if __name__ == '__main__':main()

openvino使用（一）转换并量化（INT8）分类网络模型