热度:6   发布时间:2023-12-06 13:27:04.0



      • 1. MNIST和LeNet
      • 2. 使用caffe命令行接口训练模型
        • 2.1 准备数据集
        • 2.2 定义 LeNet 模型
        • 2.3 训练 LeNet
      • 3. 使用pycaffe接口训练模型
        • 3.1 设置环境
        • 3.2 在 jupyter notebook 中工作

1. MNIST和LeNet

MNIST 是一个手写体数字识别数据集. 它是从 National Institute of Standards and Technology (NIST) 的庞大数据集中构建的一个子集.

LeNet 是 Lecun 设计的卷积神经网络,用于手写数字分类任务.

本文基于【DeepLearning】【Caffe】编译caffe及虚拟python环境的pycaffe接口 部署的环境.

2. 使用caffe命令行接口训练模型

Training LeNet on MNIST with Caffe

2.1 准备数据集

1、 进入caffe-master目录,$CAFFE_ROOTcaffe-master的绝对路径.


2、 下载并解压数据集. 在$CAFFE_ROOT/data/mnist/得到train-images-idx3-ubytetrain-labels-idx1-ubytet10k-images-idx3-ubytet10k-labels-idx1-ubyte.

$ ./data/mnist/get_mnist.sh

3、 将原始数据集转换为lmdb格式. 训练集在$CAFFE_ROOT/examples/mnist/mnist_train_lmdb目录下,测试集在$CAFFE_ROOT/examples/mnist/mnist_test_lmdb.

$ ./examples/mnist/create_mnist.sh
2.2 定义 LeNet 模型

caffe 命令行工具使用 Google Protobuf 定义模型结构优化方法. 在 src/caffe/proto/caffe.proto 中可以查看 caffe 用到的设置.

1、 在 examples/mnist/lenet_train_test.prototxt 中定义 LeNet 的模型结构.


name: "LeNet"

定义数据层,从 lmdb 中读取数据:

layer {name: "mnist"type: "Data"transform_param {scale: 0.00390625}data_param {source: "mnist_train_lmdb"backend: LMDBbatch_size: 64}top: "data"top: "label"

该层name为 mnist,type为 data,从source读取数据,backend定义数据格式,batch_size大小为64. scale 0.00390625 = 1 / 256 0.00390625=1/256 0.00390625=1/256,用于将像素值标准化到 [ 0 , 1 ) [0,1) [0,1)区间. top表示该层的输出,该层产生两个 blobs,一个是 data blob,另一个是 label blob.


layer {name: "conv1"type: "Convolution"param { lr_mult: 1 }param { lr_mult: 2 }convolution_param {num_output: 20kernel_size: 5stride: 1weight_filler {type: "xavier"}bias_filler {type: "constant"}}bottom: "data"top: "conv1"

该层接收 data blob,神经元数量为 num_output . 卷积核大小为 kernel_size,步长为 strideweight_fillerbias_filler 分别定义初始权重参数和偏置参数. lr_mult 表示该层自适应的学习率,1 表示该层权重参数的学习率和优化器定义的学习率相同,2 表示该层偏置参数的学习率是优化器定义的学习率的2倍. bottom 表示该层的输入,top 表示该层的输出.


layer {name: "pool1"type: "Pooling"pooling_param {kernel_size: 2stride: 2pool: MAX}bottom: "conv1"top: "pool1"



layer {name: "ip1"type: "InnerProduct"param { lr_mult: 1 }param { lr_mult: 2 }inner_product_param {num_output: 500weight_filler {type: "xavier"}bias_filler {type: "constant"}}bottom: "pool2"top: "ip1"

该层定义也很简单,神经元数量为 num_output,其它与前面相似.


layer {name: "relu1"type: "ReLU"bottom: "ip1"top: "ip1"

该层定义激活函数,type 为激活函数类型. 如果该层支持 in-place 操作,bottomtop 可以同名,节省存储空间.


layer {name: "loss"type: "SoftmaxWithLoss"bottom: "ip2"bottom: "label"

最后该层定义损失函数,type为损失函数类型. 该层接收两个blobs计算损失值,没有输出. 当反向传播开始计算时,该层产生对上一层的梯度.

定义 Layer Rules :

layer {// ...layer definition...include: { phase: TRAIN }

在这个例子中,该层仅在模型训练阶段被包含在内. 如果将TRAIN改为TEST,则该层仅在测试阶段被包含在内. 默认的,该层没有include定义的 layer rules 时总是在网络模型之中. 因此,在 lenet_train_test.prototxt 中定义了两个 Data 层(它们的 batch_size 不同),一个用于训练阶段,另一个用于测试阶段. 同样的,Accuracy 层仅在测试阶段出现在模型之中.

2、 在 examples/mnist/lenet_solver.prototxt 中定义训练 LeNet 的优化方法.

# The train/test net protocol buffer definition
net: "examples/mnist/lenet_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 10000
# snapshot intermediate results
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet"
# solver mode: CPU or GPU
solver_mode: GPU
2.3 训练 LeNet

执行下面脚本开始训练 LeNet

$ ./examples/mnist/train_lenet.sh


#!/usr/bin/env sh
set -e./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt $@

执行训练脚本后,命令行开始输出训练过程信息. 最开始为硬件信息.

I1107 20:52:36.823984 10432 caffe.cpp:204] Using GPUs 0
I1107 20:52:37.016600 10432 caffe.cpp:209] GPU 0: GeForce GTX 1070

从 examples/mnist/lenet_solver.prototxt 初始化优化器

I1107 20:52:38.889734 10432 solver.cpp:45] Initializing solver from parameters:

从 examples/mnist/lenet_train_test.prototxt 创建训练模型

I1107 20:52:38.925832 10432 solver.cpp:102] Creating training net from net file: examples/mnist/lenet_train_test.prototxt
I1107 20:52:38.926434 10432 net.cpp:296] The NetState phase (0) differed from the phase (1) specified by a rule in layer mnist
I1107 20:52:38.926487 10432 net.cpp:296] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy
I1107 20:52:38.926750 10432 net.cpp:53] Initializing net from parameters:

开始逐层初始化,例如下面是创建 conv1 的输出信息. 这些信息对debug非常有用.

I1107 20:52:38.967473 10432 net.cpp:86] Creating Layer conv1
I1107 20:52:38.967494 10432 net.cpp:408] conv1 <- data
I1107 20:52:38.967540 10432 net.cpp:382] conv1 -> conv1
I1107 20:52:43.570137 10432 net.cpp:124] Setting up conv1
I1107 20:52:43.570161 10432 net.cpp:131] Top shape: 64 20 24 24 (737280)
I1107 20:52:43.570166 10432 net.cpp:139] Memory required for data: 3150080


I1107 20:52:43.579761 10432 net.cpp:257] Network initialization done.

从 examples/mnist/lenet_train_test.prototxt 创建测试模型

I1107 20:52:43.579917 10432 solver.cpp:190] Creating test net (#0) specified by net file: examples/mnist/lenet_train_test.prototxt
I1107 20:52:43.579960 10432 net.cpp:296] The NetState phase (1) differed from the phase (0) specified by a rule in layer mnist
I1107 20:52:43.580045 10432 net.cpp:53] Initializing net from parameters: 


I1107 20:52:43.640388 10432 net.cpp:86] Creating Layer conv1
I1107 20:52:43.640391 10432 net.cpp:408] conv1 <- data
I1107 20:52:43.640398 10432 net.cpp:382] conv1 -> conv1
I1107 20:52:43.642616 10432 net.cpp:124] Setting up conv1
I1107 20:52:43.642632 10432 net.cpp:131] Top shape: 100 20 24 24 (1152000)
I1107 20:52:43.642637 10432 net.cpp:139] Memory required for data: 4922800


I1107 20:52:43.649025 10432 net.cpp:257] Network initialization done.


I1107 20:52:43.649063 10432 solver.cpp:57] Solver scaffolding done.
I1107 20:52:43.649327 10432 caffe.cpp:239] Starting Optimization
I1107 20:52:43.649333 10432 solver.cpp:289] Solving LeNet


I1107 20:52:44.042378 10432 solver.cpp:239] Iteration 100 (561.408 iter/s, 0.178124s/100 iters), loss = 0.210162
I1107 20:52:44.042407 10432 solver.cpp:258]     Train net output #0: loss = 0.210162 (* 1 = 0.210162 loss)
I1107 20:52:44.042412 10432 sgd_solver.cpp:112] Iteration 100, lr = 0.00992565


I1107 20:52:44.659152 10432 solver.cpp:347] Iteration 500, Testing net (#0)
I1107 20:52:44.717660 10443 data_layer.cpp:73] Restarting data prefetching from start.
I1107 20:52:44.719583 10432 solver.cpp:414]     Test net output #0: accuracy = 0.9743
I1107 20:52:44.719601 10432 solver.cpp:414]     Test net output #1: loss = 0.083975 (* 1 = 0.083975 loss)


I1107 20:52:52.069522 10432 solver.cpp:464] Snapshotting to binary proto file examples/mnist/lenet_iter_5000.caffemodel
I1107 20:52:52.153425 10432 sgd_solver.cpp:284] Snapshotting solver state to binary proto file examples/mnist/lenet_iter_5000.solverstate


I1107 20:53:00.412792 10432 solver.cpp:347] Iteration 10000, Testing net (#0)
I1107 20:53:00.473321 10443 data_layer.cpp:73] Restarting data prefetching from start.
I1107 20:53:00.474253 10432 solver.cpp:414]     Test net output #0: accuracy = 0.9911
I1107 20:53:00.474272 10432 solver.cpp:414]     Test net output #1: loss = 0.029594 (* 1 = 0.029594 loss)
I1107 20:53:00.474277 10432 solver.cpp:332] Optimization Done.
I1107 20:53:00.474280 10432 caffe.cpp:250] Optimization Done.

3. 使用pycaffe接口训练模型

Learning LeNet

01-learning-lenet.ipynb 位于 $CAFFE_ROOT/examples/ 目录下.

3.1 设置环境
$ workon caffe-master
(caffe-master)$ pip install ipykernel
(caffe-master)$ python -m ipykernel install --user --name caffe-master --display-name "caffe-master"
(caffe-master)$ cd $CAFFE_ROOT/examples/
(caffe-master)$ jupyter notebook
3.2 在 jupyter notebook 中工作

打开 jupyter notebook 后如下所示

新建 notebook,命名为 Solving in Python with LeNet

In [1]:

from pylab import *
%matplotlib inline

In [2]:

caffe_root = '../'  # this file should be run from {caffe_root}/examples (otherwise change this line)

在《3.3 配置pycaffe接口依赖》中已经设置过 PYTHONPATH,可以直接导入. 如果没有设置,使用 sys.path 方法导入.

In [3]:

import caffe

使用 python 定义 LeNet 模型.

In [4]:

from caffe import layers as L, params as Pdef lenet(lmdb, batch_size):# our version of LeNet: a series of linear and simple nonlinear transformationsn = caffe.NetSpec()n.data, n.label = L.Data(batch_size=batch_size, backend=P.Data.LMDB, source=lmdb,transform_param=dict(scale=1./255), ntop=2)n.conv1 = L.Convolution(n.data, kernel_size=5, num_output=20, weight_filler=dict(type='xavier'))n.pool1 = L.Pooling(n.conv1, kernel_size=2, stride=2, pool=P.Pooling.MAX)n.conv2 = L.Convolution(n.pool1, kernel_size=5, num_output=50, weight_filler=dict(type='xavier'))n.pool2 = L.Pooling(n.conv2, kernel_size=2, stride=2, pool=P.Pooling.MAX)n.fc1 =   L.InnerProduct(n.pool2, num_output=500, weight_filler=dict(type='xavier'))n.relu1 = L.ReLU(n.fc1, in_place=True)n.score = L.InnerProduct(n.relu1, num_output=10, weight_filler=dict(type='xavier'))n.loss =  L.SoftmaxWithLoss(n.score, n.label)return n.to_proto()

转换为 protobuf 以便 Caffe 读取.

In [5]:

with open('mnist/lenet_auto_train.prototxt', 'w') as f:f.write(str(lenet('mnist/mnist_train_lmdb', 64)))with open('mnist/lenet_auto_test.prototxt', 'w') as f:f.write(str(lenet('mnist/mnist_test_lmdb', 100)))


In [6]:

!cat mnist/lenet_auto_train.prototxt
layer {
    name: "data"type: "Data"top: "data"top: "label"transform_param {
    scale: 0.003921568859368563}data_param {
    source: "mnist/mnist_train_lmdb"batch_size: 64backend: LMDB}
layer {
    name: "conv1"type: "Convolution"bottom: "data"top: "conv1"convolution_param {
    num_output: 20kernel_size: 5weight_filler {
    type: "xavier"}}
layer {
    name: "pool1"type: "Pooling"bottom: "conv1"top: "pool1"pooling_param {
    pool: MAXkernel_size: 2stride: 2}
layer {
    name: "conv2"type: "Convolution"bottom: "pool1"top: "conv2"convolution_param {
    num_output: 50kernel_size: 5weight_filler {
    type: "xavier"}}
layer {
    name: "pool2"type: "Pooling"bottom: "conv2"top: "pool2"pooling_param {
    pool: MAXkernel_size: 2stride: 2}
layer {
    name: "fc1"type: "InnerProduct"bottom: "pool2"top: "fc1"inner_product_param {
    num_output: 500weight_filler {
    type: "xavier"}}
layer {
    name: "relu1"type: "ReLU"bottom: "fc1"top: "fc1"
layer {
    name: "score"type: "InnerProduct"bottom: "fc1"top: "score"inner_product_param {
    num_output: 10weight_filler {
    type: "xavier"}}
layer {
    name: "loss"type: "SoftmaxWithLoss"bottom: "score"bottom: "label"top: "loss"


In [7]:

!cat mnist/lenet_auto_test.prototxt
layer {
    name: "data"type: "Data"top: "data"top: "label"transform_param {
    scale: 0.003921568859368563}data_param {
    source: "mnist/mnist_test_lmdb"batch_size: 100backend: LMDB}
layer {
    name: "conv1"type: "Convolution"bottom: "data"top: "conv1"convolution_param {
    num_output: 20kernel_size: 5weight_filler {
    type: "xavier"}}
layer {
    name: "pool1"type: "Pooling"bottom: "conv1"top: "pool1"pooling_param {
    pool: MAXkernel_size: 2stride: 2}
layer {
    name: "conv2"type: "Convolution"bottom: "pool1"top: "conv2"convolution_param {
    num_output: 50kernel_size: 5weight_filler {
    type: "xavier"}}
layer {
    name: "pool2"type: "Pooling"bottom: "conv2"top: "pool2"pooling_param {
    pool: MAXkernel_size: 2stride: 2}
layer {
    name: "fc1"type: "InnerProduct"bottom: "pool2"top: "fc1"inner_product_param {
    num_output: 500weight_filler {
    type: "xavier"}}
layer {
    name: "relu1"type: "ReLU"bottom: "fc1"top: "fc1"
layer {
    name: "score"type: "InnerProduct"bottom: "fc1"top: "score"inner_product_param {
    num_output: 10weight_filler {
    type: "xavier"}}
layer {
    name: "loss"type: "SoftmaxWithLoss"bottom: "score"bottom: "label"top: "loss"
In [8]:

!cat mnist/lenet_auto_solver.prototxt
# The train/test net protocol buffer definition
train_net: "mnist/lenet_auto_train.prototxt"
test_net: "mnist/lenet_auto_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 10000
# snapshot intermediate results
snapshot: 5000
snapshot_prefix: "mnist/lenet"


In [10]:

solver = None  # ignore this workaround for lmdb data (can't instantiate two solvers on the same data)
solver = caffe.SGDSolver('mnist/lenet_auto_solver.prototxt')


In [11]:

# each output is (batch size, feature dim, spatial dim)
[(k, v.data.shape) for k, v in solver.net.blobs.items()]


[('data', (64, 1, 28, 28)),('label', (64,)),('conv1', (64, 20, 24, 24)),('pool1', (64, 20, 12, 12)),('conv2', (64, 50, 8, 8)),('pool2', (64, 50, 4, 4)),('fc1', (64, 500)),('score', (64, 10)),('loss', ())]


In [12]:

# just print the weight sizes (we'll omit the biases)
[(k, v[0].data.shape) for k, v in solver.net.params.items()]


[('conv1', (20, 1, 5, 5)),('conv2', (50, 20, 5, 5)),('fc1', (500, 800)),('score', (10, 500))]


In [13]:

solver.net.forward()  # train net


    'loss': array(2.3712316, dtype=float32)}


In [14]:

solver.test_nets[0].forward()  # test net (there can be more than one)


    'loss': array(2.4383156, dtype=float32)}


In [15]:

# we use a little trick to tile the first eight images
imshow(solver.net.blobs['data'].data[:8, 0].transpose(1, 0, 2).reshape(28, 8*28), cmap='gray'); axis('off')
print('train labels:', solver.net.blobs['label'].data[:8])
train labels: [5. 0. 4. 1. 9. 2. 1. 3.]



In [16]:

imshow(solver.test_nets[0].blobs['data'].data[:8, 0].transpose(1, 0, 2).reshape(28, 8*28), cmap='gray'); axis('off')
print('test labels:', solver.test_nets[0].blobs['label'].data[:8])
test labels: [7. 2. 1. 0. 4. 1. 4. 9.]



查看第一层的梯度值. 4x5的网格中每一格都是5x5的卷积核.

In [18]:

imshow(solver.net.params['conv1'][0].diff[:, 0].reshape(4, 5, 5, 5).transpose(0, 2, 1, 3).reshape(4*5, 5*5), cmap='gray'); axis('off')


(-0.5, 24.5, 19.5, -0.5)



In [19]:

niter = 200
test_interval = 25
# losses will also be stored in the log
train_loss = zeros(niter)
test_acc = zeros(int(np.ceil(niter / test_interval)))
output = zeros((niter, 8, 10))# the main solver loop
for it in range(niter):solver.step(1)  # SGD by Caffe# store the train losstrain_loss[it] = solver.net.blobs['loss'].data# store the output on the first test batch# (start the forward pass at conv1 to avoid loading new data)solver.test_nets[0].forward(start='conv1')output[it] = solver.test_nets[0].blobs['score'].data[:8]# run a full test every so often# (Caffe can also do this for us and write to a log, but we show here# how to do it directly in Python, where more complicated things are easier.)if it % test_interval == 0:print('Iteration', it, 'testing...')correct = 0for test_it in range(100):solver.test_nets[0].forward()correct += sum(solver.test_nets[0].blobs['score'].data.argmax(1)== solver.test_nets[0].blobs['label'].data)test_acc[it // test_interval] = correct / 1e4
Iteration 0 testing...
Iteration 25 testing...
Iteration 50 testing...
Iteration 75 testing...
Iteration 100 testing...
Iteration 125 testing...
Iteration 150 testing...
Iteration 175 testing...
CPU times: user 1.91 s, sys: 551 ms, total: 2.46 s
Wall time: 1.41 s


In [20]:

_, ax1 = subplots()
ax2 = ax1.twinx()
ax1.plot(arange(niter), train_loss)
ax2.plot(test_interval * arange(len(test_acc)), test_acc, 'r')
ax1.set_ylabel('train loss')
ax2.set_ylabel('test accuracy')
ax2.set_title('Test Accuracy: {:.2f}'.format(test_acc[-1]))


Text(0.5, 1.0, 'Test Accuracy: 0.94')


查看前8个测试数据,和 LeNet 对它们的预测结果. 预测结果为长度为10的一维向量,分别表示 0~9 的置信度得分. 这里的预测结果是全连接层的直接输出,没有经过softmax激活函数. 第一横栏图像是第 1 个测试图片,手写数字 7. 第二横栏图像是 LeNet 迭代过程中 50 次测试结果. 横坐标表示使用第几次测试时的模型,它反映了迭代次数. 纵坐标表示预测标签 0~9. 像素的亮度表示置信度得分,越亮得分越高. 可以看出随着迭代次数增加,标签为 7 的那一行像素越来越亮. 在最后的标签向量中,7 是最亮的.

In [21]:

for i in range(8):figure(figsize=(2, 2))imshow(solver.test_nets[0].blobs['data'].data[i, 0], cmap='gray')figure(figsize=(10, 2))imshow(output[:50, i].T, interpolation='nearest', cmap='gray')xlabel('iteration')ylabel('label')


从上往下这 8 个测试图片越来越困难. 第八张测试图片“9”看起来很像“4”. LeNet 对其预测的结果中,4 和 9 都有很高的置信度得分.

下面展示了 softmax 激活后的结果. 可以看出,预测结果更为明显了.

In [22]:

for i in range(8):figure(figsize=(2, 2))imshow(solver.test_nets[0].blobs['data'].data[i, 0], cmap='gray')figure(figsize=(10, 2))imshow(exp(output[:50, i].T) / exp(output[:50, i].T).sum(0), interpolation='nearest', cmap='gray')xlabel('iteration')ylabel('label')



In [23]:

train_net_path = 'mnist/custom_auto_train.prototxt'
test_net_path = 'mnist/custom_auto_test.prototxt'
solver_config_path = 'mnist/custom_auto_solver.prototxt'### define net
def custom_net(lmdb, batch_size):# define your own net!n = caffe.NetSpec()# keep this data layer for all networksn.data, n.label = L.Data(batch_size=batch_size, backend=P.Data.LMDB, source=lmdb,transform_param=dict(scale=1./255), ntop=2)# EDIT HERE to try different networks# this single layer defines a simple linear classifier# (in particular this defines a multiway logistic regression)n.score =   L.InnerProduct(n.data, num_output=10, weight_filler=dict(type='xavier'))# EDIT HERE this is the LeNet variant we have already tried# n.conv1 = L.Convolution(n.data, kernel_size=5, num_output=20, weight_filler=dict(type='xavier'))# n.pool1 = L.Pooling(n.conv1, kernel_size=2, stride=2, pool=P.Pooling.MAX)# n.conv2 = L.Convolution(n.pool1, kernel_size=5, num_output=50, weight_filler=dict(type='xavier'))# n.pool2 = L.Pooling(n.conv2, kernel_size=2, stride=2, pool=P.Pooling.MAX)# n.fc1 = L.InnerProduct(n.pool2, num_output=500, weight_filler=dict(type='xavier'))# EDIT HERE consider L.ELU or L.Sigmoid for the nonlinearity# n.relu1 = L.ReLU(n.fc1, in_place=True)# n.score = L.InnerProduct(n.fc1, num_output=10, weight_filler=dict(type='xavier'))# keep this loss layer for all networksn.loss =  L.SoftmaxWithLoss(n.score, n.label)return n.to_proto()with open(train_net_path, 'w') as f:f.write(str(custom_net('mnist/mnist_train_lmdb', 64)))    
with open(test_net_path, 'w') as f:f.write(str(custom_net('mnist/mnist_test_lmdb', 100)))### define solver
from caffe.proto import caffe_pb2
s = caffe_pb2.SolverParameter()# Set a seed for reproducible experiments:
# this controls for randomization in training.
s.random_seed = 0xCAFFE# Specify locations of the train and (maybe) test networks.
s.train_net = train_net_path
s.test_interval = 500  # Test after every 500 training iterations.
s.test_iter.append(100) # Test on 100 batches each time we test.s.max_iter = 10000     # no. of times to update the net (training iterations)# EDIT HERE to try different solvers
# solver types include "SGD", "Adam", and "Nesterov" among others.
s.type = "SGD"# Set the initial learning rate for SGD.
s.base_lr = 0.01  # EDIT HERE to try different learning rates
# Set momentum to accelerate learning by
# taking weighted average of current and previous updates.
s.momentum = 0.9
# Set weight decay to regularize and prevent overfitting
s.weight_decay = 5e-4# Set `lr_policy` to define how the learning rate changes during training.
# This is the same policy as our default LeNet.
s.lr_policy = 'inv'
s.gamma = 0.0001
s.power = 0.75
# EDIT HERE to try the fixed rate (and compare with adaptive solvers)
# `fixed` is the simplest policy that keeps the learning rate constant.
# s.lr_policy = 'fixed'# Display the current training loss and accuracy every 1000 iterations.
s.display = 1000# Snapshots are files used to store networks we've trained.
# We'll snapshot every 5K iterations -- twice during training.
s.snapshot = 5000
s.snapshot_prefix = 'mnist/custom_net'# Train on the GPU
s.solver_mode = caffe_pb2.SolverParameter.GPU# Write the solver to a temporary file and return its filename.
with open(solver_config_path, 'w') as f:f.write(str(s))### load the solver and create train and test nets
solver = None  # ignore this workaround for lmdb data (can't instantiate two solvers on the same data)
solver = caffe.get_solver(solver_config_path)### solve
niter = 250  # EDIT HERE increase to train for longer
test_interval = niter / 10
# losses will also be stored in the log
train_loss = zeros(niter)
test_acc = zeros(int(np.ceil(niter / test_interval)))# the main solver loop
for it in range(niter):solver.step(1)  # SGD by Caffe# store the train losstrain_loss[it] = solver.net.blobs['loss'].data# run a full test every so often# (Caffe can also do this for us and write to a log, but we show here# how to do it directly in Python, where more complicated things are easier.)if it % test_interval == 0:print('Iteration', it, 'testing...')correct = 0for test_it in range(100):solver.test_nets[0].forward()correct += sum(solver.test_nets[0].blobs['score'].data.argmax(1)== solver.test_nets[0].blobs['label'].data)test_acc[int(it // test_interval)] = correct / 1e4_, ax1 = subplots()
ax2 = ax1.twinx()
ax1.plot(arange(niter), train_loss)
ax2.plot(test_interval * arange(len(test_acc)), test_acc, 'r')
ax1.set_ylabel('train loss')
ax2.set_ylabel('test accuracy')
ax2.set_title('Custom Test Accuracy: {:.2f}'.format(test_acc[-1]))
Iteration 0 testing...
Iteration 25 testing...
Iteration 50 testing...
Iteration 75 testing...
Iteration 100 testing...
Iteration 125 testing...
Iteration 150 testing...
Iteration 175 testing...
Iteration 200 testing...
Iteration 225 testing...


Text(0.5, 1.0, 'Custom Test Accuracy: 0.88')
