linear regression, which is just one feature, multiple linear regression (which is many
features), how to evaluate regression models, and we’ll do a case study. And then we’ll
talk about outliers and the concept of leverage. Let’s start with a recap. Regression is
for predicting real-valued outcome – things like “how many customers will arrive at
our website next week?”, “how many tv’s will we sell next year?”, and “can we
predict someone’s income from their click through information?”. So here’s our training
data, the features are here and the labels, which are real-valued (in this case, it’s
income), and then we also have the predictions, f(x), over here on the right. Now just looking
at these two columns, we have to figure out how to evaluate the closeness of the truth,
y, to what we predicted, which is f(x). Now here’s what I propose, which is y-f(x),
and as we know, that is not a great idea. Because why didn’t I choose that one? If
I choose either this one or that one it would be bad, because I’m looking at errors in
only one direction and I want to penalize errors in either direction. So I could try
to use this penalty here, which makes sure we count deviations in either direction. But
that’s not actually what we’re going to use. We’re going to use the squared error.
It’s just easier computationally and analytically, because, you know, it’s differentiable.
But you should think of this as capturing errors in both directions, just like the absolute
value I had on the previous slide. So same deal; penalize how far y is from f in either
direction. So here’s a picture of it. So here we’d want to use f(x)-y and then here
is a case where we’d want to use y-f(x), and so we could just use the absolute value
to capture both of those things, but we’re not going to, we’re actually going to square
it – get the squared error. Now the sum of squares error, this is, you know, if we
add all these errors up, this is called the sum of squares error. And we’ll get back
to that in a minute, and that is right here. But I want you to remember that this is a
fundamental quality – quantity – in regression. And what I want to do now is talk about what
the model, f, might look like. We’re going to choose f so that it’s a good model, meaning
that it minimizes this sum of squares error. So let’s talk about simple linear regression.
In simple linear regression, we have only one feature. So maybe we’re predicting the
income based on a single feature, which is maybe the number of business week clicks the
person makes. So now we have to figure out what our function f is going to look like.
So here I’ve put on a very simple function f. It just gives everyone a baseline of $100000
just for existing. And it estimates that for each click they make on the business week
website, they’re $5000 richer. So this is kind of a silly model, since it predicts that
anyone who spends all of their time on business week is a gazillionaire but hey, it’s just
an example. But for our function that estimates y from x, we’re going to choose a model
of this form. A baseline plus the multiplier for however many business week clicks we have
(called b1) times the number of clicks, which is x1. So there’s the formula again. Before
we start doing this, all we have is data. We don’t know this $100000 and we don’t
know the $5000. I’ve just made those up. We have to estimate them by using data. Now
remember, we want the sum of squares error to be small. So what we’re going to do is
choose the b0 and the b1 to minimize the total error on the training set, and that is the
procedure of simple linear regression – least squares regression. So let’s pretend I did
this, and as it turned out, actually the model I had before wasn’t so good. When I fit
it to the data, I got this model instead. And this new model fits pretty well on the
data that I have in my training set, but how well does it perform out of sample? I didn’t
tell you, but I actually left part of the data out for evaluation, and there it is.
So let’s take a look at the errors, and they’re here – not too bad, looks like
we did a pretty good job. So that is the procedure of simple linear regression – least squares
regression for a single feature. You don’t need to solve the minimization problem yourself
to find b0 and b1; the machine learning algorithm will do it for you, it’s all under the hood.
让我们从回归开始。首先,我来简单介绍一下。
线性回归,这只是一个特征,多重线性回归(很多。
特征),如何评价回归模型,我们将做一个案例研究。然后我们会
讨论离群值和杠杆的概念。让我们先回顾一下。回归
对于预测实际价值的结果,比如“有多少客户会到达”。
下周我们的网站吗?“我们明年要卖多少台电视?””、“我们可以
通过点击信息预测某人的收入?这是我们的培训
数据、特征和标签都是实值的(在本例中是这样的)。
收入,然后我们也有预测,f(x)在右边。现在只是看看
在这两列中,我们需要计算出如何评估真理的接近程度,
根据我们的预测,f(x)这是我的建议,即y-f(x)
我们知道,这不是一个好主意。为什么我没有选择那个?如果
我要么选这个要么那个坏,因为我在看错误。
只有一个方向,我想惩罚两个方向上的误差。所以我可以试试
在这里使用这个惩罚,它确保我们计算两个方向上的偏差。但
这不是我们要用的。我们要用平方误差。
它更容易计算和分析,因为,你知道,它是可微分的。
但是你应该把这看成是在两个方向上捕捉错误,就像绝对一样。
我在上一张幻灯片上的值。所以同样的交易;惩罚y与f的距离。
方向。这是它的图片。这里我们要用f(x)-y,然后这里。
如果我们想用y-f(x)我们可以用绝对值?
为了捕捉这两个东西,但我们不打算,我们实际上要去平方。
得到平方误差。平方误差的和,这是,如果我们。
把所有这些错误加起来,这叫做平方误差平方和。我们会回来的
在一分钟内,就在这里。但我希望你们记住这是a。
基本质量-数量-回归。我现在要讲的是。
模型f可能是这样的。我们要选f,这是一个很好的模型。
它最小化了平方误差之和。我们来谈谈简单线性回归。
在简单线性回归中,我们只有一个特征。也许我们在预测。
基于单一功能的收入,也就是商业周刊点击次数。
使人。现在我们要算出函数f的样子。
这里我写了一个很简单的函数f,它给了每个人10万美元的基线。
对于现有的。而且它估计,每一次点击都是在商业周刊上。
丰富的网站,他们5000美元。这是一个愚蠢的模型,因为它预测了。
任何一个在商业周刊上花时间的人都是亿万富翁,但是,嘿,这只是。
一个例子。但是对于我们的函数,从x中估计y,我们将选择一个模型。
这种形式的。一个基线加上一个乘数,无论多少商业周刊点击我们。
(b1)乘以点击次数,即x1。这是公式。之前
我们开始这样做,我们只有数据。我们不知道这10万美元,我们也不知道。
知道5000美元。我刚编的。我们必须用数据来估计它们。现在
记住,我们要求平方和误差小。我们要做的是。
选择b0和b1来最小化训练集上的总误差,这就是。
简单线性回归的过程-最小二乘回归。让我们假设我做了。
事实证明,我之前的模型并不是很好。当我健康
对于数据,我得到了这个模型。这个新模型很适合。
我的训练集里有数据,但是样本外的数据有多好?我没有
告诉你,但我把一部分数据留出来做评估了,就是这样。
让我们来看看这些错误,它们在这里——不太坏,看起来像。
我们做得很好。这就是简单线性回归的过程——最小二乘法。
单一功能的回归。你不需要自己解决最小化问题。
找到b0和b1;机器学习算法会为你做,都在引擎盖下面。