Precise detection of Chinese characters in historical documents with deep reinforcement learning_综合

Precise detection of Chinese characters in historical documents with deep reinforcement learning

发表于 Pattern Recognition 2020
DOI：https://doi.org/10.1016/j.patcog.2020.107503

Abstract：
Here, we use this method for precise character detection by making tight bounding boxes around the Chinese characters in historical documents. An agent is trained to learn the control policy of fine-tuning a bounding box step-by-step through a Markov Decision Process.
We introduce a novel fully convolutional network with position-sensitive Region-of-Interest (RoI) pooling (FCPN). The network receives character patches as input without fixed size, and it can fuse position information into the fea- tures of actions. Besides, we propose a dense reward function (DRF) that provides excellent rewards according to different actions and environment states, improving the decision-making ability of the agent.

Contributions：
1）利用深度强化学习框架在大IoU下获得更紧密边界框的一种精确检测汉字的新方法，可以在文本检测器之后应用。
2）带有位置敏感的RoI池化（FCPN）的全卷积网络，可以在使用深度强化学习的微调过程中接收任意大小的字符补丁；这些形状不需要固定大小的输入。
3）一种用于训练过程的新型密集奖赏功能（DRF）。借助出色的奖励回报，代理对不同的行为和环境状态更加敏感。因此，代理可以有效地学习，从而增强决策能力。
4）我们将对Dueling DQN [11]，Double DQN [12]和优先体验重播[13]方法的优点结合在一起，以简单有效的DQN变体训练代理。我们提出的精度检测方法在TKH和MTH数据集上均优于最新方法，在IoU 0.8准则下具有显着改进。
5）我们还将方法扩展到了场景文本检测领域，在该领域中，字符背景通常很复杂且难以区分。我们对动作建模方法进行了少许修改，并获得了有希望的结果，这表明了这项工作的有效性和普遍性。

Method :
首先，通过文本检测器粗略地检测字符。然后，使用深度强化学习（DRL）精炼每个字符，并获得最终的精确结果。
如图蓝色虚线部分所示，以原始粗略检测到的汉字区域w * h为输入，基本主干特征提取器由两个残差块组成，每个残差块均由三个卷积层组成。受Dueling网络[11]想法的启发，在backbone输出的末尾，精心设计了两个流，通过使用位置敏感的RoI池分别估计状态值和每个动作的优势[21]整合行为的位置信息。
提出的具有位置敏感RoI池的全卷积网络的详细结构，k，s，p分别是内核，步幅和填充大小；红色圆角矩形中的w，h，c，s和g分别表示pooling宽度，高度，输出通道，空间大小和组大小。
窗口中的箭头表示移动方向。第五动作表示停止。
提升效果：F-measure
这个方法相当于一个损失函数，把已有方法的粗略检测结果作为输入，用深度强化学习进行坐标微调，更接近真值。