逻辑回归
回归:输入输出均为连续变量;
分类:输出为离散变量;
联合概率计算最大似然函数,即调整当前超参数,使之符合训练数据的概率最大。
评价回归函数
设置超参数,描述联合概率:
g(wTx)=11+e?z=11+ewTx{P(y=1)=g(wTx)P(y=0)=1?g(wTx)?P(True)=(g(w,xi))yi?(1?g(wi,xi))1?yi?L(w?)=∏i=1mP(True)?Loss(w?)=?1mL(w?)\begin{alignedat} a&g(w^Tx) = \frac {1}{1+e^{-z}} = \frac {1}{1+e^{w^Tx}}\\ &\begin{cases} P(y=1) &= g(w^Tx)\\ P(y=0) &= 1-g(w^Tx) \end{cases}\\ \Rightarrow &P( True ) = (g(w,x_i))^{y_i}*(1-g(w_i,xi))^{1-y_i}\\ \Rightarrow &L(\vec w) = \prod_{i=1}^mP(True)\\ \Rightarrow &Loss(\vec w) = -{1\over m}L(\vec w) \end{alignedat} ????g(wTx)=1+e?z1?=1+ewTx1?{
P(y=1)P(y=0)?=g(wTx)=1?g(wTx)?P(True)=(g(w,xi?))yi??(1?g(wi?,xi))1?yi?L(w)=i=1∏m?P(True)Loss(w)=?m1?L(w)?
其中,y是真实值。P表示当前超参数时,各情况概率,用以评价当前超参数。此时损失函数描述了变量w的变化规律。
推导似然函数 L 及损失函数:
hθ(x)=g(θ;x)=11+eθTxL(θ)=∏i=1m(hθ(xi))yi(1?hθ(xi))1?yi?logL(θ)=∑i=1myilog(hθ(xi))+(1?yi)log(1?hθ(xi))?δδθjlogL(θ)=?1m∑i=1m(yi1hθ(xi)δδθhθ(xi)?(1?yi)11?hθ(xi)δδθjhθ(xi))=?1m∑i=1m[yi1hθ(xi)?(1?yi)11?hθ(xi)]δδθjhθ(xi)=?1m∑i=1m[yi1hθ(xi)?(1?yi)11?hθ(xi)]hθ(xi)(1?hθ(xi))δδθjθTxi=?1m∑i=1m[yi(1?hθ(xi))?(1?yi)hθ(xi)]δδθjθTxi=?1m∑i=1m[yi(1?hθ(xi))?(1?yi)hθ(xi)]xij=1m∑i=1m(hθ(xi)?yi)xij\begin{alignedat}a h_\theta(x) &= g(\theta; x) = \frac{1}{1+e^{\theta ^T x}} \\ L(\theta) &= \prod_{i=1}^{m}(h_\theta(x_i))^{y_i}(1-h_{\theta}(x_i))^{1-y_i}\\ \Rightarrow logL(\theta) &= \sum_{i=1}^my_ilog(h_{\theta}(x_i))+(1-y_i)log(1-h_{\theta}(x_i))\\ \Rightarrow \frac{\delta}{\delta_{\theta_j}}logL(\theta) &= -\frac{1}{m}\sum_{i=1}^m( y_i\frac{1}{h_\theta(x_i)}\frac{\delta}{\delta_{\theta}}h_\theta(x_i) - (1-y_i)\frac{1}{1-h_{\theta}(x_i)}\frac{\delta}{\delta_{\theta_j}}h_\theta(x_i) )\\ &= -\frac{1}{m}\sum_{i=1}^m[y_i\frac{1}{h_\theta{(x_i)}}-(1-y_i)\frac{1}{1-h_\theta(x_i)}] \frac{\delta}{\delta_{\theta_j}}h_\theta(x_i)\\ &= -\frac{1}{m}\sum_{i=1}^m[y_i\frac{1}{h_\theta{(x_i)}}-(1-y_i)\frac{1}{1-h_\theta(x_i)}] h_\theta(x_i)(1-h_\theta(x_i))\frac{\delta}{\delta_{\theta_j}}\theta^Tx_i\\ &= -\frac{1}{m}\sum_{i=1}^m[y_i(1-h_\theta(x_i))-(1-y_i)h_\theta{(x_i)}] \frac{\delta}{\delta_{\theta_j}}\theta^Tx_i\\ &= -\frac{1}{m}\sum_{i=1}^m[y_i(1-h_\theta(x_i))-(1-y_i)h_\theta{(x_i)}]{x_i}_j\\ &= \frac{1}{m}\sum_{i=1}^m(h_\theta(x_i)-y_i){x_i}_j\\ \end{alignedat} hθ?(x)L(θ)?logL(θ)?δθj??δ?logL(θ)?=g(θ;x)=1+eθTx1?=i=1∏m?(hθ?(xi?))yi?(1?hθ?(xi?))1?yi?=i=1∑m?yi?log(hθ?(xi?))+(1?yi?)log(1?hθ?(xi?))=?m1?i=1∑m?(yi?hθ?(xi?)1?δθ?δ?hθ?(xi?)?(1?yi?)1?hθ?(xi?)1?δθj??δ?hθ?(xi?))=?m1?i=1∑m?[yi?hθ?(xi?)1??(1?yi?)1?hθ?(xi?)1?]δθj??δ?hθ?(xi?)=?m1?i=1∑m?[yi?hθ?(xi?)1??(1?yi?)1?hθ?(xi?)1?]hθ?(xi?)(1?hθ?(xi?))δθj??δ?θTxi?=?m1?i=1∑m?[yi?(1?hθ?(xi?))?(1?yi?)hθ?(xi?)]δθj??δ?θTxi?=?m1?i=1∑m?[yi?(1?hθ?(xi?))?(1?yi?)hθ?(xi?)]xi?j?=m1?i=1∑m?(hθ?(xi?)?yi?)xi?j??
更新超参数
上例中求得了针对变量的**偏导数**,实际变量变化时候,更新方向也要依据偏导数进行更新:
θj=θj?α1m∑i=1m(hθ(xi)?yi)xij\theta_j = \theta_j-\alpha\frac1{m}\sum_{i=1}^m(h_\theta(x_i)-y_i){x_i}_j θj?=θj??αm1?i=1∑m?(hθ?(xi?)?yi?)xi?j?
多分类的softmax
其中的概率函数表示:
hθ(x(i))=[p(y(i)=1∣x(i);θ)p(y(i)=2∣x(i);θ)..p(y(i)=k∣x(i);θ);]=1∑j=1keθjTx(i)[eθ1Tx(i)eθ2Tx(i).eθkTx(i)]\begin{alignedat}a h_\theta(x^{(i)}) &= \begin{bmatrix} p(y^{(i)} = 1|x^{(i)};\theta)\\ p(y^{(i)} = 2|x^{(i)};\theta)\\ .\\. p(y^{(i)} = k|x^{(i)};\theta); \end{bmatrix} &= {1 \over {\sum_{j=1}^k}e^{\theta^T_jx^{(i)}}} \begin{bmatrix} e^{\theta^T_1x^{(i)}}\\ e^{\theta^T_2x^{(i)}}\\ .\\ e^{\theta^T_kx^{(i)}}\\ \end{bmatrix} \end{alignedat} hθ?(x(i))?=?????p(y(i)=1∣x(i);θ)p(y(i)=2∣x(i);θ)..p(y(i)=k∣x(i);θ);???????=∑j=1k?eθjT?x(i)1???????eθ1T?x(i)eθ2T?x(i).eθkT?x(i)????????