theano logistic regression讲解
逻辑模型是一个基于概率的线性分类器。它的参数是w和b。 通过把输入向量映射到一个超平面集合上来实现分类,每个超平面对应一个分类。从超平面到输入向量的距离反应了这个概率,就是说输入属于这个分类的概率。数学上,一个输入属于某个分类的公式可以表达为下面的公式:
#initialize with 0 the weights W as a matrix of shape (n_in, n_out)self.W = theano.shared(value=numpy.zeros((n_in, n_out),dtype=theano.config.floatX),name='W',borrow=True)# initialize the biases b as a vector of n_out 0sself.b = theano.shared(value=numpy.zeros((n_out,),dtype=theano.config.floatX),name='b',borrow=True)# symbolic expression for computing the matrix of class-membership# probabilities# Where:# W is a matrix where column-k represent the separation hyperplane for# class-k# x is a matrix where row-j represents input training sample-j# b is a vector where element-k represent the free parameter of# hyperplane-kself.p_y_given_x = T.nnet.softmax(, self.W) + self.b)#上述输出是一个概率向量# symbolic description of how to compute prediction as class whose# probability is maximalself.y_pred = T.argmax(self.p_y_given_x, axis=1)#这个x轴指的是竖着选,还有一个是横着选的,就是axis=2了。
学习的过程就是最小化一个代价函数。在多分类逻辑回归模型中,常用的就是负的似然函数作为代价函数。这个相当于在\theta决定的模型下,对数据集D的一个最大似然估计。用大白话来讲,就是针对特定的数据集 D,对参数\theta进行调整,使最后的预测准确率最高,代价最小即可。我们先开始来定义代价函数:
# y.shape[0] is (symbolically) the number of rows in y, i.e.,# number of examples (call it n) in the minibatch# T.arange(y.shape[0]) is a symbolic vector which will contain# [0,1,2,... n-1] T.log(self.p_y_given_x) is a matrix of# Log-Probabilities (call it LP) with one row per example and# one column per class LP[T.arange(y.shape[0]),y] is a vector# v containing [LP[0,y[0]], LP[1,y[1]], LP[2,y[2]], ...,# LP[n-1,y[n-1]]] and T.mean(LP[T.arange(y.shape[0]),y]) is# the mean (across minibatch examples) of the elements in v,# i.e., the mean log-likelihood across the minibatch.return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])
上面的这个代码和计算公式有点出入,需要详细研究. 本来按道理,就是把所有的输入的集合计算一遍,求得对数的和即可,值了为什么要这个表达式呢:[T.arange(y.shape[0]
class LogisticRegression(object):"""Multi-class Logistic Regression ClassThe logistic regression is fully described by a weight matrix :math:`W`and bias vector :math:`b`. Classification is done by projecting datapoints onto a set of hyperplanes, the distance to which is used todetermine a class membership probability."""def __init__(self, input, n_in, n_out):""" Initialize the parameters of the logistic regression:type input: theano.tensor.TensorType:param input: symbolic variable that describes the input of thearchitecture (one minibatch):type n_in: int:param n_in: number of input units, the dimension of the space inwhich the datapoints lie:type n_out: int:param n_out: number of output units, the dimension of the space inwhich the labels lie"""# start-snippet-1# initialize with 0 the weights W as a matrix of shape (n_in, n_out)self.W = theano.shared(value=numpy.zeros((n_in, n_out),dtype=theano.config.floatX),name='W',borrow=True)# initialize the biases b as a vector of n_out 0sself.b = theano.shared(value=numpy.zeros((n_out,),dtype=theano.config.floatX),name='b',borrow=True)# symbolic expression for computing the matrix of class-membership# probabilities# Where:# W is a matrix where column-k represent the separation hyperplane for# class-k# x is a matrix where row-j represents input training sample-j# b is a vector where element-k represent the free parameter of# hyperplane-kself.p_y_given_x = T.nnet.softmax(, self.W) + self.b)# symbolic description of how to compute prediction as class whose# probability is maximalself.y_pred = T.argmax(self.p_y_given_x, axis=1)# end-snippet-1# parameters of the modelself.params = [self.W, self.b]# keep track of model inputself.input = inputdef negative_log_likelihood(self, y):"""Return the mean of the negative log-likelihood of the predictionof this model under a given target distribution... math::\frac{1}{|\mathcal{D}|} \mathcal{L} (\theta=\{W,b\}, \mathcal{D}) =\frac{1}{|\mathcal{D}|} \sum_{i=0}^{|\mathcal{D}|}\log(P(Y=y^{(i)}|x^{(i)}, W,b)) \\\ell (\theta=\{W,b\}, \mathcal{D}):type y: theano.tensor.TensorType:param y: corresponds to a vector that gives for each example thecorrect labelNote: we use the mean instead of the sum so thatthe learning rate is less dependent on the batch size"""# start-snippet-2# y.shape[0] is (symbolically) the number of rows in y, i.e.,# number of examples (call it n) in the minibatch# T.arange(y.shape[0]) is a symbolic vector which will contain# [0,1,2,... n-1] T.log(self.p_y_given_x) is a matrix of# Log-Probabilities (call it LP) with one row per example and# one column per class LP[T.arange(y.shape[0]),y] is a vector# v containing [LP[0,y[0]], LP[1,y[1]], LP[2,y[2]], ...,# LP[n-1,y[n-1]]] and T.mean(LP[T.arange(y.shape[0]),y]) is# the mean (across minibatch examples) of the elements in v,# i.e., the mean log-likelihood across the minibatch.return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])# end-snippet-2def errors(self, y):"""Return a float representing the number of errors in the minibatchover the total number of examples of the minibatch ; zero oneloss over the size of the minibatch:type y: theano.tensor.TensorType:param y: corresponds to a vector that gives for each example thecorrect label"""# check if y has same dimension of y_predif y.ndim != self.y_pred.ndim:raise TypeError('y should have the same shape as self.y_pred',('y', y.type, 'y_pred', self.y_pred.type))# check if y is of the correct datatypeif y.dtype.startswith('int'):# the T.neq operator returns a vector of 0s and 1s, where 1# represents a mistake in predictionreturn T.mean(T.neq(self.y_pred, y))else:raise NotImplementedError()
# generate symbolic variables for input (x and y represent a# minibatch)x = T.matrix('x') # data, presented as rasterized imagesy = T.ivector('y') # labels, presented as 1D vector of [int] labels# construct the logistic regression class# Each MNIST image has size 28*28classifier = LogisticRegression(input=x, n_in=28 * 28, n_out=10)
# the cost we minimize during training is the negative log likelihood of# the model in symbolic formatcost = classifier.negative_log_likelihood(y)
g_W = T.grad(cost=cost, wrt=classifier.W)g_b = T.grad(cost=cost, wrt=classifier.b)
# specify how to update the parameters of the model as a list of# (variable, update expression) pairs.updates = [(classifier.W, classifier.W - learning_rate * g_W),(classifier.b, classifier.b - learning_rate * g_b)]# compiling a Theano function `train_model` that returns the cost, but in# the same time updates the parameter of the model based on the rules# defined in `updates`train_model = theano.function(inputs=[index],outputs=cost,updates=updates,givens={x: train_set_x[index * batch_size: (index + 1) * batch_size],y: train_set_y[index * batch_size: (index + 1) * batch_size]})
update 方法做了两件事情,它是成对出现的,第一个是更新w,通过w=w-learning_rate*g_w,
- train_model的输入是迷你批的索引,和批大小和在一起,可以确定输入x,和 标签y.
- 返回值是一个代价函数,它是针对此次迷你批的代价函数。
- 每一次函数调用,此函数首先计算出来x,y,然后计算代价函数,最后更新updates list
每个时间单位t ,train_model被调用,它将会计算和返回迷你批的代价,也会执行一个训练数据的一步,整个学习算法这样在一个循环中进行了。