给出一个docker + gunicorn + flask 来部署模型的例子_综合

本文将站在读者是模型部署方面的小白的角度来详细介绍如何利用docker, gunicorn和flask来部署模型。写此文的目的，一方面是为了将我的经验分享给大家，另一方面也算是作为我的学习笔记，方便日后查阅。

关于docker, gunicorn 和 flask的基础知识，本文不做介绍，可以百度。我也在文末给出一些我看过的比较好的资料。

1.部署所需文件及详细内容

这里的模型是用来对mnist数字集进行分类的，相关内容来自于github上的项目(https://github.com/cloudxlab/ml)。我的操作系统为Ubuntu 16.04, 工作目录为 /home/user/flask_app/, 该目录下的文件和文件夹信息如下：

drwxrwxr-x 2 hzhang hzhang 4096 Mar 26 10:18 model/
drwxrwxr-x 2 hzhang hzhang 4096 Mar 26 10:44 test_images/
-rw-rw-r-- 1 hzhang hzhang   86 Apr 21 09:58 requirements.txt
-rwxrwxrwx 1 hzhang hzhang  446 Apr 21 10:27 gunicorn.sh
-rw-rw-r-- 1 hzhang hzhang 1505 Apr 21 10:46 gunicorn_config.py
-rw-rw-r-- 1 hzhang hzhang  531 Apr 21 10:50 Dockerfile
-rw-rw-r-- 1 hzhang hzhang 1530 Apr 21 15:17 predictions.py

我们先要准备好主程序文件predictions.py(来自于github项目，https://github.com/cloudxlab/ml)，因为要用flask部署，所以对该程序做了一定的修改：

import numpy as np
from sklearn.externals import joblib
from PIL import Image
from flask import Flask, jsonify, request
from werkzeug.utils import secure_filename# Create flask app
app = Flask(__name__)# Load the previously trained model from the file
#model = joblib.load("../trained_models/mnist_model.pkl")
model = joblib.load("./model/mnist_model.pkl")@app.route('/')
def home_endpoint():return 'Hello World!'# /predict is the end point
@app.route('/predict', methods=["POST"])
def predict_image():# Read the image uploaded by the curl or ajax or postmanrequested_img = request.files.get('file')filename = secure_filename(requested_img.filename)'''Convert the uploaded image to greyscale.Since in MNIST the training images are greyscaled hence we will have to convert the uploaded image to greyscale'''greyscale_img = Image.open(requested_img).convert('L')'''Resize the uploaded image to 28x28 pixels.Since in MNIST the training images are of 28x28 pixels hence we will have to resize the uploaded image to 28x28 pixels.'''resized_image = greyscale_img.resize((28,28))# Convert the image to an arrayimg = np.asarray(resized_image)# Reshape the image to (784, 1)img = img.reshape(784,)# Predict the digit using the trained modelpred = model.predict(img.reshape(1, -1))# Get the digitresult = int(pred.tolist()[0])#Return the JSON responsereturn jsonify({"digit": result})
#    return None

然后准备好Dockerfile, 具体如下：

FROM python:3.6# make directories suited to your application 
RUN mkdir -p /home/project/app
RUN mkdir -p /home/test/server/bin
RUN mkdir -p /home/project/app/log/WORKDIR /home/project/app# copy and install packages for flask
COPY requirements.txt /home/project/app
RUN pip install --no-cache-dir -r requirements.txt# copy contents from your local to your docker container
COPY . /home/project/app
COPY ./model /home/project/app/modelRUN chmod +x /home/project/app/gunicorn.shENTRYPOINT ["./gunicorn.sh"]#EXPOSE 5000

其中gunicorn.sh的内容如下：

#!/bin/bashtouch /home/project/app/log/access_print.log
touch /home/project/app/log/error_print.logexec gunicorn predictions:app -c gunicorn_config.py \--access-logfile=/home/project/app/log/access_print.log \--error-logfile=/home/project/app/log/error_print.log

这里，我把gunicorn的配置文件写在了gunicorn_config.py中，其内容如下：

# gunicorn_config.py
import logging
import logging.handlers
from logging.handlers import WatchedFileHandler
import os
import multiprocessingbind = '0.0.0.0:5000'      #绑定ip和端口号
backlog = 512                #监听队列
timeout = 300      #超时
worker_class = 'gevent' #使用gevent模式，还可以使用sync 模式，默认的是sync模式
workers = multiprocessing.cpu_count() * 2 + 1    #进程数
threads = 2 #指定每个进程开启的线程数
loglevel = 'debug' #日志级别，这个日志级别指的是错误日志的级别，而访问日志的级别无法设置
access_log_format = '%(t)s %(p)s %(h)s "%(r)s" %(s)s %(L)s %(b)s %(f)s" "%(a)s"'    #设置gunicorn访问日志格式，错误日志无法设置"""
其每个选项的含义如下：
h          remote address
l          '-'
u          currently '-', may be user name in future releases
t          date of the request
r          status line (e.g. ``GET / HTTP/1.1``)
s          status
b          response length or '-'
f          referer
a          user agent
T          request time in seconds
D          request time in microseconds
L          request time in decimal seconds
p          process ID
"""

根据我的经验，配置文件里的ip地址不要用你本地宿主机的ip地址，否则会出错。

这个项目需要在docker安装的软件包都保存在requirements.txt中，内容如下：

Flask==1.0.2
numpy==1.16.2
scikit-learn==0.21.3
pillow==6.0.0
gunicorn==19.9.0
gevent

2.部署步骤：

1.先查看一下本地机器上已有的docker 镜像文件：

> docker imagesREPOSITORY                             TAG                 IMAGE ID            CREATED             SIZE
python                                 3.7                 8e3336637d81        5 weeks ago         919MB
python                                 3.6-slim            d3ae39a2a3a1        7 weeks ago         174MB
prakhar1989/static-site                latest              f01030e1dcf3        4 years ago         134MB

2.然后在工作目录下，输入如下命令（将创建的镜像文件命名为ml_mnist）：

> docker build -t ml_mnist .

注意，不要忘了最后的那个“ . ”, 它表示的是当前文件夹。然后再查看一下docker中的镜像文件：

REPOSITORY                             TAG                 IMAGE ID            CREATED             SIZE
ml_mnist                               latest              2873b006c4e3        4 hours ago         1.14GB
python                                 3.7                 8e3336637d81        5 weeks ago         919MB
python                                 3.6-slim            d3ae39a2a3a1        7 weeks ago         174MB
prakhar1989/static-site                latest              f01030e1dcf3        4 years ago         134MB

可以看到，新的镜像"ml_mnist"已经生成了。

3.利用生成的镜像，创建我们需要的容器。可以把容器命名为 digit_deploy, 并且在/home/user/下生成访问日志文件和错误日志文件，使得即便退出了容器，我们也能在该目录下查看这些文件。

> docker run -p 8888:5000 -v /home/user/:/home/project/app/log --name digit_deploy 2873b006c4e3

因为/home/user/是本地电脑的路径，而/home/project/app/log是容器内部的路径。注意这行命令的最后是镜像文件的ID。运行完这条命令后，我们的容器就会生成并启动。这里的 -p 8888:5000 表示宿主机端口到容器端口的映射。

3.测试访问生成的容器：

> curl http://0.0.0.0:8888/
Hello World!

注意，如果你输入 curl http://0.0.0.0:5000/ 则会报错。原因是5000是容器的端口，它已经被映射到了宿主机的8888端口了。

另外，如果输入 curl http://192.168.x.xx:8888/ 则也可以成功访问。其中，192.168.x.xx是公司内部局域网的网址。

> curl -F file=@test_images/2.png http://0.0.0.0:8888/predict
{"digit":2}

如有问题，欢迎留言讨论。觉得本文不错，请点个赞，^_^！

本文的写作，主要是参考了：一个简单的Docker+Gunicorn+Flask示例_cong_da_da的博客-CSDN博客

关于Docker （第一个需要翻墙）:

https://docker-curriculum.com

阿里云登录 - 欢迎登录阿里云，安全稳定的云计算服务平台

关于Gunicorn: https://blog.csdn.net/y472360651/article/details/78538188