caffe :commond layer（常用层）_综合

commond layer 下面分为三个分别是：

Inner Product - fully connected layer.
Dropout
Embed - for learning embeddings of one-hot encoded vector (takes index as input).
-
1）inner product or fully connected layer
例子：

layer {name: "fc8"type: "InnerProduct"# learning rate and decay multipliers for the weightsparam { lr_mult: 1 decay_mult: 1 }# learning rate and decay multipliers for the biasesparam { lr_mult: 2 decay_mult: 0 }inner_product_param {num_output: 1000weight_filler {type: "gaussian"std: 0.01}bias_filler {type: "constant"value: 0}}bottom: "fc7"top: "fc8"
}

Parameter：

message InnerProductParameter {optional uint32 num_output = 1; // The number of outputs for the layeroptional bool bias_term = 2 [default = true]; // whether to have bias termsoptional FillerParameter weight_filler = 3; // The filler for the weightoptional FillerParameter bias_filler = 4; // The filler for the bias// The first axis to be lumped into a single inner product computation;// all preceding axes are retained in the output.// May be negative to index from the end (e.g., -1 for the last axis).optional int32 axis = 5 [default = 1];// Specify whether to transpose the weight matrix or not.// If transpose == true, any operations will be performed on the transpose// of the weight matrix. The weight matrix itself is not going to be transposed// but rather the transfer flag of operations will be toggled accordingly.optional bool transpose = 6 [default = false];
}

解释：

输入： n*c0*h*w

输出： n*c1*1*1
全连接层，把输入当作成一个向量，输出也是一个简单向量（把输入数据blobs的width和height全变为1）
全连接层实际上也是一种卷积层，只是它的卷积核大小和原数据大小一致。因此它的参数基本和卷积层的参数一样。

层类型：InnerProduct

lr_mult: 学习率的系数，最终的学习率是这个数乘以solver.prototxt配置文件中的base_lr。如果有两个lr_mult, 则第一个表示权值的学习率，第二个表示偏置项的学习率。一般偏置项的学习率是权值学习率的两倍。

必须设置的参数：

num_output: 过滤器（filfter)的个数

其它参数：

weight_filler: 权值初始化。默认为“constant”,值全为0，很多时候我们用”xavier”算法来进行初始化，也可以设置为”gaussian”
bias_filler: 偏置项的初始化。一般设置为”constant”,值全为0。
bias_term: 是否开启偏置项，默认为true, 开启
2） drop out layer
例子：

layer {name: "drop6"type: "Dropout"bottom: "fc6"top: "fc6"dropout_param {dropout_ratio: 0.5}
}

Parameter：

message DropoutParameter {optional float dropout_ratio = 1 [default = 0.5]; // dropout ratio
}

解释：Dropout将在训练过程中每次更新参数时按一定概率（rate）随机断开输入神经元，Dropout层用于防止过拟合。
Dropout 是一种非常非常通用的解决深层神经网络中 overfitting 问题的方法, 过程极其简单, 在调试算法中效果也非常有效, 几乎是在设计网络过程中必用的技巧.Dropout 除了具有防止 overfitting 的作用之外, 还有 model ensemble 的作用.
我们考虑, 假设 σ=0.5, 如果 Forward 的次数足够多 (例如无穷次), 每次都有一半的连接被咔嚓掉, 在整个训练过程中, 被咔嚓掉的连接的组合是 2n, 那么, 留下的连接的组合种类也是 2n, 所以, 这就相当于我们训练了 2n 个模型, 然后 ensemble 起来.
其操作方法是, 首先设定一个 dropout ratio σ, σ 是超参数, 范围设置为 (0,1), 表示在 Forward 阶段需要随机断开的连接的比例. 每次 Forward 的时候都要随机的断开该比例的连接, 只更新剩下的 weight. 最后, 在 test/predict 的时候, 使用全部的连接, 不过, 这些 weights 全部都需要乘上 1?σ

3）Embed Layer

例子：

layer {name: "embed1000_to_100"type: "Embed"bottom: "compact_one_hot_dim1000"top: "embed1000_to_100"embed_param {num_output: 100 # output dimensioninput_dim: 1000}
}

paramter：

message EmbedParameter {optional uint32 num_output = 1; // The number of outputs for the layer// The input is given as integers to be interpreted as one-hot// vector indices with dimension num_input. Hence num_input should be// 1 greater than the maximum possible input value.optional uint32 input_dim = 2;optional bool bias_term = 3 [default = true]; // Whether to use a bias termoptional FillerParameter weight_filler = 4; // The filler for the weightoptional FillerParameter bias_filler = 5; // The filler for the bias}

上边这个例子是说，把一个1000个单词，压缩到一个100维的空间中。