chapter 6 deep feed forward networks
chapter 6.1 example learning xor
J(Θ)=14∑x∈x(f?(x)?f(x;Θ))2
now we must chose our model f(x;Θ) ,
linear model
f(x;w,b)=xTw+b
it can not describe the xor logic
add a hidden linear layer
h=f1(x;W,c) , y=f2(h;w,b)
f(x;W,c,w,b)=f2(f1(x))
f1(x)=WTx and f2(h)=hTw
we get f(x)=wTWTx .
clearly, we must use a nonlinear layer to represent the features.
ReLU
rectified linear unit
f(x;W,c,w,b)=wTmax{
0,WTx+c}+b
W=[1111]
c=[0?1]
w=[1?2]
calculate
x=?????00110101?????
xW=?????01120112?????
xW+c=?????0112?1001?????
ReLU ?????01120001?????
wTh=?????0110?????
get it