当前位置: 代码迷 >> 综合 >> 深度学习框架tensorflow学习与应用6(优化器SGD、ADAM、Adadelta、Momentum、RMSProp比较)
  详细解决方案

深度学习框架tensorflow学习与应用6(优化器SGD、ADAM、Adadelta、Momentum、RMSProp比较)

热度:45   发布时间:2023-11-25 05:05:23.0

看到一个图片,就是那个表情包,大家都知道:

Adadelta  》  NAG 》 Momentum》 Remsprop 》Adagrad 》SGD

但是我觉得看情况而定,比如有http://blog.51cto.com/12568470/1898367常见优化算法 (tensorflow对应参数)就认为实际工作上实践中觉得是ADAM ,但是谁说的准呢是吧,每个工程师的场景不一样,得到的实践的经验也不一样,也说不准呢。

所以有的人建议:调试时用快的优化器去训练,等发论文时,所有的优化器都尝试一次,取最好的效果就好。

 

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data# 载入数据集
mnist = input_data.read_data_sets("MNIST_data", one_hot=True)# 每个批次的大小
batch_size = 100
# 计算一共有多少个批次
n_batch = mnist.train.num_examples // batch_size# 定义两个placeholder
x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])# 创建一个简单的神经网络
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
prediction = tf.nn.softmax(tf.matmul(x, W) + b)# 二次代价函数
# loss = tf.reduce_mean(tf.square(y-prediction))
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=prediction))
# 使用梯度下降法
# train_step = tf.train.GradientDescentOptimizer(0.2).minimize(loss)
# 初始化变量
init = tf.global_variables_initializer()# 结果存放在一个布尔型列表中
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(prediction, 1))  # argmax返回一维张量中最大的值所在的位置
# 求准确率
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))with tf.Session() as sess:sess.run(init)for epoch in range(21):for batch in range(n_batch):batch_xs, batch_ys = mnist.train.next_batch(batch_size)sess.run(train_step, feed_dict={x: batch_xs, y: batch_ys})acc = sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels})print("Iter " + str(epoch) + ",Testing Accuracy " + str(acc))
  • SGD

optimizer = tf.train.GradientDescentOptimizer(learning_rate=self.learning_rate)

Iter 0,Testing Accuracy 0.825
Iter 1,Testing Accuracy 0.8798
Iter 2,Testing Accuracy 0.8994
Iter 3,Testing Accuracy 0.9047
Iter 4,Testing Accuracy 0.9076
Iter 5,Testing Accuracy 0.9104
Iter 6,Testing Accuracy 0.9121
Iter 7,Testing Accuracy 0.9127
Iter 8,Testing Accuracy 0.9147
Iter 9,Testing Accuracy 0.9166
Iter 10,Testing Accuracy 0.9174
Iter 11,Testing Accuracy 0.9167
Iter 12,Testing Accuracy 0.9183
Iter 13,Testing Accuracy 0.9183
Iter 14,Testing Accuracy 0.9202
Iter 15,Testing Accuracy 0.9197
Iter 16,Testing Accuracy 0.9204
Iter 17,Testing Accuracy 0.9213
Iter 18,Testing Accuracy 0.921
Iter 19,Testing Accuracy 0.9213
Iter 20,Testing Accuracy 0.9217
  • ADAM

   optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate, epsilon=1e-08)

train_step_AdamOptimizer = tf.train.AdamOptimizer(1e-3).minimize(loss)

Iter 0,Testing Accuracy 0.8991
Iter 1,Testing Accuracy 0.9109
Iter 2,Testing Accuracy 0.916
Iter 3,Testing Accuracy 0.9199
Iter 4,Testing Accuracy 0.9229
Iter 5,Testing Accuracy 0.9247
Iter 6,Testing Accuracy 0.926
Iter 7,Testing Accuracy 0.9276
Iter 8,Testing Accuracy 0.928
Iter 9,Testing Accuracy 0.928
Iter 10,Testing Accuracy 0.9289
Iter 11,Testing Accuracy 0.9297
Iter 12,Testing Accuracy 0.9308
Iter 13,Testing Accuracy 0.93
Iter 14,Testing Accuracy 0.9297
Iter 15,Testing Accuracy 0.9304
Iter 16,Testing Accuracy 0.9301
Iter 17,Testing Accuracy 0.9318
Iter 18,Testing Accuracy 0.9306
Iter 19,Testing Accuracy 0.9315
Iter 20,Testing Accuracy 0.9317

train_step_AdadeltaOptimizer = tf.train.AdadeltaOptimizer(1e-3).minimize(loss)

Iter 0,Testing Accuracy 0.6779
Iter 1,Testing Accuracy 0.6761
Iter 2,Testing Accuracy 0.6757
Iter 3,Testing Accuracy 0.6762
Iter 4,Testing Accuracy 0.6775
Iter 5,Testing Accuracy 0.6785
Iter 6,Testing Accuracy 0.6791
Iter 7,Testing Accuracy 0.682
Iter 8,Testing Accuracy 0.6854
Iter 9,Testing Accuracy 0.6878
Iter 10,Testing Accuracy 0.6896
Iter 11,Testing Accuracy 0.6919
Iter 12,Testing Accuracy 0.6941
Iter 13,Testing Accuracy 0.6961
Iter 14,Testing Accuracy 0.6973
Iter 15,Testing Accuracy 0.6983
Iter 16,Testing Accuracy 0.6996
Iter 17,Testing Accuracy 0.7001
Iter 18,Testing Accuracy 0.7013
Iter 19,Testing Accuracy 0.7017
Iter 20,Testing Accuracy 0.7023

train_step_AdadeltaOptimizer = tf.train.AdadeltaOptimizer(1).minimize(loss)

Iter 0,Testing Accuracy 0.874
Iter 1,Testing Accuracy 0.8949
Iter 2,Testing Accuracy 0.905
Iter 3,Testing Accuracy 0.9075
Iter 4,Testing Accuracy 0.9102
Iter 5,Testing Accuracy 0.9125
Iter 6,Testing Accuracy 0.9141
Iter 7,Testing Accuracy 0.9154
Iter 8,Testing Accuracy 0.9173
Iter 9,Testing Accuracy 0.9182
Iter 10,Testing Accuracy 0.919
Iter 11,Testing Accuracy 0.9204
Iter 12,Testing Accuracy 0.9213
Iter 13,Testing Accuracy 0.9213
Iter 14,Testing Accuracy 0.9222
Iter 15,Testing Accuracy 0.9228
Iter 16,Testing Accuracy 0.9226
Iter 17,Testing Accuracy 0.9226
Iter 18,Testing Accuracy 0.9232
Iter 19,Testing Accuracy 0.9239
Iter 20,Testing Accuracy 0.9237

上面的 AdadeltaOptimizer 学习率来看就知道啦,还是要找到最合适的学习率才行,要不然还是没啥效果,理论归整是理论。

AdadeltaOptimizertf.train包中的一个优化器,它可以自动调整学习率。最开始的时候我看到这个优化器觉得很厉害的一个,结果使用后发现loss根本不下降,本来还以为是用法用错了呢,几经周折,最后才发现是学习率设置的问题。

 # set optimizer
optimizer = tf.train.AdadeltaOptimizer()
# set train_op
train_op = slim.learning.create_train_op(loss, optimizer)

AdadeltaOptimizer优化器默认的learning_rate=0.001,非常的小,导致梯度下降速度非常慢,最后的解决方案是:提高学习率

 # set optimizer
optimizer = tf.train.AdadeltaOptimizer(learning_rate=1)
# set train_op
train_op = slim.learning.create_train_op(loss, optimizer)

 http://data-science.vip/2017/12/18/TensorFlow%E7%BC%96%E7%A8%8B%E4%B8%AD%E8%B8%A9%E8%BF%87%E7%9A%84%E5%9D%91(1).html  TensorFlow编程中踩过的坑(1)

  • Momentum
    train_step_MomentumOptimizer = tf.train.MomentumOptimizer(1e-3, 0.9).minimize(loss)

Iter 0,Testing Accuracy 0.548
Iter 1,Testing Accuracy 0.6094
Iter 2,Testing Accuracy 0.7094
Iter 3,Testing Accuracy 0.7726
Iter 4,Testing Accuracy 0.791
Iter 5,Testing Accuracy 0.7964
Iter 6,Testing Accuracy 0.8015
Iter 7,Testing Accuracy 0.8056
Iter 8,Testing Accuracy 0.8081
Iter 9,Testing Accuracy 0.811
Iter 10,Testing Accuracy 0.8137
Iter 11,Testing Accuracy 0.8162
Iter 12,Testing Accuracy 0.8184
Iter 13,Testing Accuracy 0.8193
Iter 14,Testing Accuracy 0.8198
Iter 15,Testing Accuracy 0.821
Iter 16,Testing Accuracy 0.8226
Iter 17,Testing Accuracy 0.8236
Iter 18,Testing Accuracy 0.8243
Iter 19,Testing Accuracy 0.825
Iter 20,Testing Accuracy 0.8259

  • RMSProp

    train_step_RMSPropOptimizer = tf.train.RMSPropOptimizer(0.003, 0.9).minimize(loss)

Iter 0,Testing Accuracy 0.9158
Iter 1,Testing Accuracy 0.9216
Iter 2,Testing Accuracy 0.9263
Iter 3,Testing Accuracy 0.9274
Iter 4,Testing Accuracy 0.9275
Iter 5,Testing Accuracy 0.9319
Iter 6,Testing Accuracy 0.9309
Iter 7,Testing Accuracy 0.9286
Iter 8,Testing Accuracy 0.9303
Iter 9,Testing Accuracy 0.9305
Iter 10,Testing Accuracy 0.9316
Iter 11,Testing Accuracy 0.9318
Iter 12,Testing Accuracy 0.933
Iter 13,Testing Accuracy 0.9316
Iter 14,Testing Accuracy 0.9327
Iter 15,Testing Accuracy 0.9315
Iter 16,Testing Accuracy 0.9307
Iter 17,Testing Accuracy 0.9327
Iter 18,Testing Accuracy 0.9332
Iter 19,Testing Accuracy 0.9324
Iter 20,Testing Accuracy 0.9316

  • 实际经验

ADAM通常会取得比较好的结果,同时收敛非常快相比SGD L-BFGS适用于全batch做优化的情况, 有时候可以多种优化方法同时使用,比如使用SGD进行warm up,然后ADAM 对于比较奇怪的需求,deepbit两个loss的收敛需要进行控制的情况,比较慢的SGD比较适用。

 

tensorflow 不同优化算法对应的参数

SGD

optimizer = tf.train.GradientDescentOptimizer(learning_rate=self.learning_rate)

Momentum

optimizer = tf.train.MomentumOptimizer(lr, 0.9)

AdaGrad

optimizer = tf.train.AdagradientOptimizer(learning_rate=self.learning_rate)

RMSProp

optimizer = tf.train.RMSPropOptimizer(0.001, 0.9)

ADAM

optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate, epsilon=1e-08)

部分局部参数需要查找tensorflow官方文档

 

  相关解决方案