Tensorflow MNIST 手写体识别代码注释(3)
- tf.train.GradientDescentOptimizer
- tf.global_variables_initializer()
- mnist.train.next_batch()
- sess.run
- 再论 cost 定义
tf.train.GradientDescentOptimizer
代码
cost = tf.reduce_mean(-tf. reduce_sum(y*tf.log(pred), reduction_indices=1))
learning_rate = 0.01
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
tf.train.GradientDescentOptimizer(learning_rate, use_locking = False, name = ’GradientDescent’)
参数
- learning_rate: A Tensor or a floating point value. 要使用的学习率
- use_locking: 要是 True 的话,就对于更新操作(update operations.)使用锁
- name: 名字,可选,默认是 ”GradientDescent”
例子
import tensorflow as tfx = tf.Variable(2, name='x', dtype=tf.float32)
log_x = tf.log(x)
log_x_squared = tf.square(log_x)optimizer = tf.train.GradientDescentOptimizer(0.5)#minimize() 函数处理了梯度计算和参数更新两个操作
train = optimizer.minimize(log_x_squared)
init = tf.initialize_all_variables()with tf.Session() as session:session.run(init)print("starting at", "x:", session.run(x), "log(x)^2:", session.run(log_x_squared))for step in range(10): session.run(train)print("step", step, "x:", session.run(x), "log(x)^2:", session.run(log_x_squared))
其他优化器
优化器的源代码在文件:C:\Users\yeping\Anaconda3\Lib\site-packages\tensorflow\train\__init__.py
中,查了一下,一共有下面这些优化器,做个记号回头慢慢研究。
- AdadeltaOptimizer
- AdagradOptimizer
- AdagradDAOptimizer
- AdamOptimizer
- FtrlOptimizer
- GradientDescentOptimizer
- MomentumOptimizer
- Optimizer
- ProximalAdagradOptimizer
- ProximalGradientDescentOptimizer
- RMSPropOptimizer
- SyncReplicasOptimizer
网上查了一下,有篇博客讨论了各种优化器,内容非常翔实,留下做个参考:SanFanCSgo, 《机器学习:各种优化器Optimizer的总结与比较》
tf.global_variables_initializer()
在使用变量之前,必须对变量进行初始化。按照习惯用法,使用tf.global_variables_initializer()将所有全局变量的初始化器汇总,并对其进行初始化。
init = tf.global_variables_initializer()
with tf.Session() as sess:sess.run(init)
mnist.train.next_batch()
mnist.train.next_batch是专门用于由tensorflow提供的MNIST教程的函数。它的工作原理是在开始时将训练图像和标签对随机化,并在每次调用该函数时选择每个随后的batch_size张图像。一旦到达末尾,图像标签对将再次随机分配,并重复该过程。仅在使用所有可用对后,才重新组合和重复整个数据集。
tf.reset_default_graph()
x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])
W = tf.Variable(tf.random_normal([784, 10]))
b = tf.Variable(tf.zeros([10]))learning_rate = 0.01
training_epochs = 25
batch_size = 100
total_batch= int(mnist.train.num_examples/batch_size)pred = tf.nn.softmax(tf.matmul(x, W) + b)
cost = tf.reduce_mean(-tf. reduce_sum(y*tf.log(pred), reduction_indices = 1))
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)with tf.Session() as sess:sess.run(tf.global_variables_initializer())for epoch in range(training_epochs):avg_cost= 0for i in range(total_batch):batch_xs, batch_ys = mnist.train.next_batch(batch_size)_, c = sess.run([optimizer, cost], feed_dict = {
x: batch_xs, y: batch_ys})avg_cost += c / total_batchif (epoch+1) % display_step == 0:print ("Epoch:", "%04d" % (epoch+1), "cost = ", "{:9f}".format(avg_cost))print("finished!")
sess.run
_, c = sess.run([optimizer, cost], feed_dict = {x: batch_xs, y: batch_ys})
这一段代码,首先展示了 Python 函数可以返回多个值的特点。实质上是下面的赋值语法:
p, q = [1, 2]
... ...
[p, q] = [1, 2]
其次,根据上面的定义,可以看出 optimizer、cost、x、y 之间的依赖关系。首先根据 x、y 计算 cost,然后计算 optimizer。当然,optimizer 本身并不返回结果。
再论 cost 定义
有必要再次讨论 cost 的定义。
x = tf.placeholder( tf.float32, [None, 784] )
y = tf.placeholder( tf.float32, [None, 10] )
W = tf.Variable( tf.random_normal([784, 10]) )
b = tf.Variable( tf.zeros([10]) )
pred = tf.nn.softmax( tf.matmul(x, W) + b )
cost = tf.reduce_mean( -tf.reduce_sum( y*tf.log(pred), reduction_indices = 1 ) )
训练数据集的大小
x、y 的定义中用到了 None,它表示数据集的数量不确定。于是才有了后面的代码:
_, c = sess.run([optimizer, cost], feed_dict = {
x: batch_xs, y: batch_ys})
其中 feed_dict = {x: batch_xs, y: batch_ys} 赋予了 x、y 具体的数据集,数据集大小为 batch_size = 100。
cost 的计算过程
y、pred 都是 100 x 10 维度的张量,y*tf.log(pred) 结果也是 100 x 10 维度的张量。tf.reduce_sum( y*tf.log(pred), reduction_indices = 1 )
把维度 1 通过求和运算压缩了,结果变成了 100 x 1 维度的张量。
令
r = y*tf.log(pred)
s = -tf.reduce_sum( r, reduction_indices = 1 )
cost = tf.reduce_mean( t )
则
r = [ r 1 , 1 r 1 , 2 . . . r 1 , 10 r 2 , 1 r 2 , 2 . . . r 1 , 10 . . . . . . r 100 , 1 r 100 , 2 . . . r 100 , 10 ] r =\left[ \begin{matrix} r_{1,1}&r_{1,2}&...& r_{1,10}\\ r_{2,1}&r_{2,2}&...& r_{1,10}\\ &...&...\\ r_{100,1}&r_{100,2}&...& r_{100,10}\\ \end{matrix} \right] r=?????r1,1?r2,1?r100,1??r1,2?r2,2?...r100,2??............?r1,10?r1,10?r100,10???????
s = [ s 1 s 2 . . . s 100 ] = [ r 1 , 1 + r 1 , 2 + . . . + r 1 , 10 r 2 , 1 + r 2 , 2 + . . . + r 1 , 10 . . . r 100 , 1 + r 100 , 2 + . . . + r 100 , 10 ] s = \left[ \begin{matrix} s_1\\ s_2\\ ...\\ s_{100}\\ \end{matrix} \right]= \left[ \begin{matrix} r_{1,1}+r_{1,2}+...+ r_{1,10}\\ r_{2,1}+r_{2,2}+...+ r_{1,10}\\ ...\\ r_{100,1}+r_{100,2}+...+ r_{100,10}\\ \end{matrix} \right] s=?????s1?s2?...s100???????=?????r1,1?+r1,2?+...+r1,10?r2,1?+r2,2?+...+r1,10?...r100,1?+r100,2?+...+r100,10???????
最后,
c o s t = s 1 + s 2 + . . . + s 100 100 cost = \frac{s_1+s_2+...+s_{100}}{100} cost=100s1?+s2?+...+s100??