softmax loss 系列记录_综合

一、前言

最近看人脸识别方面的内容，遇到了好多个损失函数，结合所看的博客，在这里先大致记录下，方便查阅，暂不做太深的挖掘。

二、主要内容

2.1、`Softmax loss`

其公式如下：
在这里插入图片描述
其中，N表示训练数据的数量。 $f_j$ 代表 class scores $f$ 的第j个元素。也是全连接层的输出，所以 $f_j$ 可以写为：

用模的形式表示的话如下：
在这里插入图片描述

所以Softmax loss中 $L_i$ 变为：
在这里插入图片描述

2.2、`L-Softmax loss`

即Softmax loss的改进，Large-Margin softmax loss，来自ICML2016的论文：Large-Margin Softmax Loss for Convolutional Neural Networks

论文地址:(http://proceedings.mlr.press/v48/liud16.pdf)

L-Softmax loss公式（ $L_i$ ）：
在这里插入图片描述
这里是论文对此公式定义做出的一个解释：

大致意思就是说 $x$ 来自于class 1，若分类正确呢，肯定是：
$W1Tx>W2Tx即∥W1T∥∥x∥cos(θ1)>∥W2T∥∥x∥cos(θ2)W_1^Tx>W_2^Tx{即\left\|W^T_1\right\|\left\|x \right\|cos(\theta_1)>\left\|W^T_2\right\|\left\|x \right\|cos(\theta_2)}$
但是呢，为使分类更明显点，作者决定用：
$∥W1T∥∥x∥cos(mθ1)>∥W2T∥∥x∥cos(θ2)\left\|W^T_1\right\|\left\|x \right\|cos(m\theta_1)>\left\|W^T_2\right\|\left\|x \right\|cos(\theta_2)$

其中，m 是一个正整数。

下图是从几何角度直观地看两种损失的差别，通过对比我们可以发现L-softmax loss最后学到的特征之间的分离程度比softmax loss的要明显得多。
在这里插入图片描述

2.3、`Center loss`

来自ECCV2016的一篇论文：A Discriminative Feature Learning Approach for Deep Face Recognition。
论文地址：http://ydwen.github.io/papers/WenECCV16.pdf
Center loss公式（ $L_i$ ）是什么呢？简单讲就是在Softmax loss的基础上加了一项 $L_C$ 。
在这里插入图片描述
$c_{y_i}$ 表示第 $y_i$ 个类别的特征中心， $x_i$ 表示全连接层之前的特征，m表示mini-batch的大小，n表示类别数。
$L_C$ 梯度和 $c_{y_i}$ 的更新公式如下：

具体的算法如下：

2.4、`SphereFace`

来自于CVPR2017的一篇论文：
SphereFace: Deep Hypersphere Embedding for Face Recognition
论文地址：https://arxiv.org/abs/1704.08063
代码地址：https://github.com/wy1iu/sphereface
pytorch地址：https://github.com/clcarwin/sphereface_pytorch

提出基于 angular margin的angular softmax loss，之前的损失函数基本都基于 Euclidean margin。这里我们看到本文的作者和L-Softmax loss的作者是同一批，那么，A-Softmax loss(angular softmax loss)就是在L-Softmax loss的基础上，令 $∥W∥=1\left\|W \right\| =1$ 和 $b = 0$ 。
这是Softmax loss：
在这里插入图片描述

于是，结合上面两个条件，便有了modified softmax loss：
在这里插入图片描述
然后又添加了和large margin softmax loss一样的角度参数angular margin，便得到了A-Softmax loss：

但是上述公式有个角度限制，即 $θ\theta$ 的范围为[0,π/m]。为了打破这个限制，便有了能够在CNN网络中优化的公式：
在这里插入图片描述
且最后证明得出：对于二分类， $mmin>2+3m_{min}>2+\sqrt{3}$ ,对于多分类, $m_{min}>3$ 。文中作者m取的是4。

后来，又有了F-Norm SphereFace：
在这里插入图片描述

2.5、`CosinFace`

理解了SphereFace的angular softmax loss公式，这个也就好理解了。就是将 $cosθcos\theta$ 变为了 $cosθ?mcos\theta-m$ 。CosinFace中的additive margin softmax loss公式如下（AM-Softmax）：
在这里插入图片描述

2.6、`Arcface`

将 $cosθcos\theta$ 变为了 $cos(θ+m)cos(\theta+m)$ 。

在这里插入图片描述

2.8、分类边界

在这里插入图片描述

可以看到，CosineFace的坐标轴是关于cos的，不是直接在角度空间。

三、结尾

差不多到这里就结束了。下面是学习时参考的一些博客。

参考:
损失函数改进之Large-Margin Softmax Loss
损失函数改进之Center Loss
人脸识别–SphereFace
SphereFace算法详解
ArcFace算法笔记

softmax loss 系列记录

一、前言

二、主要内容

2.1、Softmax loss

2.2、L-Softmax loss

2.3、Center loss

2.4、SphereFace

2.5、CosinFace

2.6、Arcface