I want to talk about a very basic method, in particular, it’s a very old algorithm.
It dates back at least 50 years, but it really works.
You won’t be disappointed in this algorithm. It’s simple,
it’s fast and it often competes with the best machine learning methods,
and it’s called logistic regression.
Now, logistic regression uses this loss function over here that we talked about.
You saw this before. It’s the dark blue curve over here.
Now I want you to remember this function as best you can for a little while and so I’m going to put it in the corner over there.
So logistic regression minimizes this; and there’s no regularization for vanilla logistic regression.
It just minimizes this function, which is just the average loss on the training points.
Now, we have to choose what kind of model f is going to be, and we’ll choose a linear model.
So it’s a weighted sum of the features.
For instance, if we’re trying to predict income,
our model might look like three times the hours a person works,
plus four times the years of experience and so on.
So here,
the first feature x1 is hours and beta one is 3 and so on. I can write it here in summation notation.
So f is the sum of the weighted features,
where the weights are called beta. And I also call the beta “coefficients”.
So here, all I did was plug this form of f into that minimization problem for logistic regression.
So now, it’s going to try to find the weights that minimize the sum of the training losses.
And this is what logistic regression does; no more, no less.
It just chooses the coefficients (those betas) to minimize this thing.
我想讲一个非常基本的方法,特别是,它是一个非常古老的算法。
它可以追溯到至少50年前,但它确实有效。
你不会对这个算法感到失望。这很简单,
它是快速的,它经常与最好的机器学习方法竞争,
这叫做逻辑回归。
现在,逻辑回归使用这个损失函数我们讨论过。
你之前看到这个。这是深蓝色的曲线。
现在我想让你们记住这个函数你们可以用一会,我把它放在那边的角落里。
所以logistic回归最小化了这个;而且没有常规的逻辑回归。
它将这个函数最小化,这只是训练点的平均损失。
现在,我们要选择f是什么样的模型,我们会选择一个线性模型。
所以它是特征的加权和。
例如,如果我们试图预测收入,
我们的模型可能是一个人工作时间的三倍,
加上四年的经验等等。
这里,
x1的第一个特征是小时,而第一个是3,以此类推。我可以把它写成求和符号。
所以f是加权特征的和,
权重被称为。我也叫它系数。
在这里,我所做的就是将f的形式代入到logistic回归的最小化问题中。
所以现在,它会试图找到最小化训练损失总和的权重。
这就是逻辑回归的作用;不能多也不能少。
它只会选择系数(那些betas)来最小化这个东西。