当前位置: 代码迷 >> 综合 >> Tesseract-OCR-3.0.5 数字识别训练与合并多次训练数据
  详细解决方案

Tesseract-OCR-3.0.5 数字识别训练与合并多次训练数据

热度:11   发布时间:2024-01-05 12:01:32.0

最近项目中有个需求,使用手持设备对3C码进行拍照识别,最后决定使用Tesseract-OCR,刚才对这个不了解,网上一大堆帖子,按照步骤操作下来,要么报错,要么就是标题党,实在是很恶心。为了以后可能还是用到,特意记录下来。

我的环境

  • Windows10

  • JDK1.8

  • Tesseract-OCR-3.0.5
    下载地址:https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-3.05.02-20180621.exe
    在这里插入图片描述

  • jTessBoxEditor-2.2.0
    下载地址: https://sourceforge.net/projects/vietocr/files/jTessBoxEditor
    在这里插入图片描述

软件安装

Tesseract-OCR-3.0.5

安装比较简单,双击直接安装即可,也不需要更改目录。安装完毕后需要配置环境变量。

TESSDATA_PREFIX
C:\Program Files (x86)\Tesseract-OCR\tessdata
path
C:\Program Files (x86)\Tesseract-OCR

TESSDATA_PREFIX 配置:
TESSDATA_PREFIX 配置截图
path 配置
path 环境变量配置截图

jTessBoxEditor-2.2.0 安装

这个安装直接解压即可,这个软件需要有JAVA的环境,有关JAVA的安装和环境变量配置比较简单,这里就不介绍了,注意如果没有java环境这个软件是运行不起来的。

具体步骤

下面这张图是我已经完成的截图,其中num.traineddata,这个文件就是最后生成的训练文件。
num-1、num-2、num-3文件夹是存放的需要学习的图片,我为什么分了3个文件夹呢?是这样,为了提高我们日后识别的准确率,所以这个训练的过程是持续的,num-1表示第一次训练学习的数据,num-2表示第二次训练学习的数据,依次类推,这样的好处就是,省去了上次学习训练的重新校对box文件。我们只需要把本次需要训练学习的数据生成tif,在生成box文件进行对本次的内容进行校对即可,然后后续合并结果,生成最终的训练好的文件来使用。
训练完毕的截图

准备训练数据

由于项目的识别的内容都是数字,所以我准备的训练数据都是带有数字的图片

num-1文件夹训练数据内容
num-3文件夹训练数据内容
num-2文件夹训练数据内容
在这里插入图片描述
num-3文件夹训练数据内容
num-3文件夹训练数据内容
准备好数据目录结构
在这里插入图片描述

生成tif文件

使用jTessBoxEditor工具生成tif文件,为了方便后续操作,将生成好的tif文件保持到Scan-OCR目录下,解压jTessBoxEditor压缩包后进入双击train.bat即可运行。

打开jTessBoxEditor工具后,点击Tools,点击Merge TIFF,选中num-1文件夹中所有图片,点击打开。
在这里插入图片描述
调整保存目录,保存名为:num.font.exp1.tif 然后点击保存。
在这里插入图片描述
点击保存后,提示完成了num.font.exp1.tif文件的创建。在Scan-OCR目录下可以看到刚创建的文件
在这里插入图片描述
上面已经完成了num-1文件夹中训练数据tif文件的创建,num-2、num-3文件训练数据创建tif重复上面步骤即可
完成3个文件夹创建trf文件的目录结构:
trf文件创建截图

生成bok文件

通过cmd命令的方式进行生成3个box文件,命令:

tesseract num.font.exp1.tif num.font.exp1 batch.nochop makebox
tesseract num.font.exp2.tif num.font.exp2 batch.nochop makebox
tesseract num.font.exp3.tif num.font.exp3 batch.nochop makebox

执行命令过程:

D:\Scan-OCR>tesseract num.font.exp1.tif num.font.exp1 batch.nochop makebox
Tesseract Open Source OCR Engine v3.05.02 with Leptonica
Page 1
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 2
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 3
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 4
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 5
Warning. Invalid resolution 1 dpi. Using 70 instead.D:\Scan-OCR>tesseract num.font.exp2.tif num.font.exp2 batch.nochop makebox
Tesseract Open Source OCR Engine v3.05.02 with Leptonica
Page 1
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 2
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 3
Warning. Invalid resolution 1 dpi. Using 70 instead.D:\Scan-OCR>tesseract num.font.exp3.tif num.font.exp3 batch.nochop makebox
Tesseract Open Source OCR Engine v3.05.02 with Leptonica
Page 1
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 2
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 3
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 4
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 5
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 6
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 7
Warning. Invalid resolution 1 dpi. Using 70 instead.D:\Scan-OCR>

执行命令后,会生成3个box文件
bok文件创建截图

字符和位置进行校正

使用jTessBoxEditor工具打开每个tif进行字符和位置校正,然后保存即可
打开trf文件截图
截图
图中:
char 表示识别的字符
x y width height 表示字符的位置信息,我们微调的内容
1)字符是否识别正确
2)字符位置信息是否正确(比如图中字符2char对应的字符是正确的,但是位置信息不正确,经过调整,如下:)
在这里插入图片描述
如果需要调整,我们调整后需要保存一下,注意每张被训练的图片,调整后的信息都是存放在对应的bok文件中的。感兴趣的,可以打开看看。
这个微调的过程很枯燥,都是重复性的工作,慢慢的调整完所有的图片后,保存就可以进行下一步操作。

生成TR文件

tesseract num.font.exp1.tif num.font.exp1 nobatch box.train
tesseract num.font.exp2.tif num.font.exp2 nobatch box.train
tesseract num.font.exp3.tif num.font.exp3 nobatch box.train

执行命令过程:

D:\Scan-OCR>tesseract num.font.exp1.tif num.font.exp1 nobatch box.train
Tesseract Open Source OCR Engine v3.05.02 with Leptonica
Page 1
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile:      11Found 11 good blobs.
Generated training data for 2 words
Page 2
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile:      13Found 13 good blobs.
Generated training data for 3 words
Page 3
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile:      12Found 12 good blobs.
Generated training data for 1 words
Page 4
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile:      12Found 12 good blobs.
Generated training data for 1 words
Page 5
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile:      12Found 12 good blobs.
Generated training data for 3 wordsD:\Scan-OCR>tesseract num.font.exp2.tif num.font.exp2 nobatch box.train
Tesseract Open Source OCR Engine v3.05.02 with Leptonica
Page 1
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile:      12Found 12 good blobs.
Generated training data for 2 words
Page 2
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile:      12Found 12 good blobs.
Generated training data for 1 words
Page 3
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile:      12Found 12 good blobs.
Generated training data for 1 wordsD:\Scan-OCR>tesseract num.font.exp3.tif num.font.exp3 nobatch box.train
Tesseract Open Source OCR Engine v3.05.02 with Leptonica
Page 1
Warning. Invalid resolution 1 dpi. Using 70 instead.
row xheight=78.5, but median xheight = 11.5
APPLY_BOXES:Boxes read from boxfile:      13Found 13 good blobs.
Generated training data for 2 words
Page 2
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile:      12Found 12 good blobs.
Generated training data for 1 words
Page 3
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile:      12Found 12 good blobs.
Generated training data for 2 words
Page 4
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile:      12Found 12 good blobs.
Generated training data for 1 words
Page 5
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile:       6Found 6 good blobs.
Generated training data for 1 words
Page 6
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile:       6Found 6 good blobs.
Generated training data for 1 words
Page 7
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile:      14Found 14 good blobs.
Generated training data for 3 wordsD:\Scan-OCR>

执行命令后,会生成3个tr文件
tr文件截图

新建字体特征文件

创建一个名称为font_properties的字体特征文件。文件内容格式:

其中fontname为字体名称,必须与[lang].[fontname].exp[num].box中的名称保持一致。
、 、 、、 的取值为1或0,表示字体是否具有这些属性。
在Scan-OCR目录下创建一个名称为font_properties的文件,用记事本打开,输入以下下内容:

font 0 0 0 0 0

这里全取值为0,表示字体不是粗体、斜体等等。注意font_properties文件是没有拓展名的
在这里插入图片描述

从所有文件中提取字符

输入命令,生成unicharset文件

D:\Scan-OCR>unicharset_extractor num.font.exp1.box num.font.exp2.box num.font.exp3.box
Extracting unicharset from num.font.exp1.box
Extracting unicharset from num.font.exp2.box
Extracting unicharset from num.font.exp3.box
Wrote unicharset file ./unicharset.

生成shape文件

输入命令生成shapetable文件:

D:\Scan-OCR>shapeclustering -F font_properties -U unicharset num.font.exp1.tr num.font.exp2.tr num.font.exp3.tr
Reading num.font.exp1.tr ...
Reading num.font.exp2.tr ...
Reading num.font.exp3.tr ...
Bad properties for index 3, char 3: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 4, char 0: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 5, char 9: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 6, char 4: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 7, char 1: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 8, char 2: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 9, char 8: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 10, char 6: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 11, char 7: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 12, char 5: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 13, char ?: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 14, char F: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 15, char 垄: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 16, char ~: 0,255 0,255 0,0 0,0 0,0
Building master shape table
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0 1 2 3 4 5 6 7 8 9 10 11 12 13
Stopped with 0 merged, min dist 0.100629
Master shape_table:Number of shapes = 14 max unichars = 1 number with multiple unichars = 0

生成聚集字符特征文件

D:\Scan-OCR>mftraining -F font_properties -U unicharset -O unicharset num.font.exp1.tr num.font.exp2.tr num.font.exp3.tr
Read shape table shapetable of 14 shapes
Reading num.font.exp1.tr ...
Reading num.font.exp2.tr ...
Reading num.font.exp3.tr ...
Bad properties for index 3, char 3: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 4, char 0: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 5, char 9: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 6, char 4: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 7, char 1: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 8, char 2: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 9, char 8: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 10, char 6: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 11, char 7: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 12, char 5: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 13, char ?: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 14, char F: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 15, char 垄: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 16, char ~: 0,255 0,255 0,0 0,0 0,0
Warning: no protos/configs for Joined in CreateIntTemplates()
Warning: no protos/configs for |Broken|0|1 in CreateIntTemplates()
Done!

合并所有tr文件

D:\Scan-OCR>cntraining num.font.exp1.tr num.font.exp2.tr num.font.exp3.tr
Reading num.font.exp1.tr ...
Reading num.font.exp2.tr ...
Reading num.font.exp3.tr ...
Clustering ...Writing normproto ...

修改文件名

所有后生成的文件加前缀。
把训练过程创建的五个文件:shapetable,normproto,inttemp,pffmtable,unicharset,用lang.为前缀重命名(例如num.)
训练文件加前缀

生成训练结果

D:\Scan-OCR>combine_tessdata num.
Combining tessdata files
TessdataManager combined tesseract data files.
Offset for type  0 (num.config                ) is -1
Offset for type  1 (num.unicharset            ) is 140
Offset for type  2 (num.unicharambigs         ) is -1
Offset for type  3 (num.inttemp               ) is 1033
Offset for type  4 (num.pffmtable             ) is 145113
Offset for type  5 (num.normproto             ) is 145252
Offset for type  6 (num.punc-dawg             ) is -1
Offset for type  7 (num.word-dawg             ) is -1
Offset for type  8 (num.number-dawg           ) is -1
Offset for type  9 (num.freq-dawg             ) is -1
Offset for type 10 (num.fixed-length-dawgs    ) is -1
Offset for type 11 (num.cube-unicharset       ) is -1
Offset for type 12 (num.cube-word-dawg        ) is -1
Offset for type 13 (num.shapetable            ) is 147115
Offset for type 14 (num.bigram-dawg           ) is -1
Offset for type 15 (num.unambig-dawg          ) is -1
Offset for type 16 (num.params-model          ) is -1
Output num.traineddata created successfully.

截至到这里,我们训练结束了,Scan-OCR目录生成训练文件:
训练学习成果文件

测试

将num.traineddata文件拷贝到tessdata目录下,可以进行测试。

D:\Scan-OCR\num-1>tesseract 1.png result -l num
Tesseract Open Source OCR Engine v3.05.02 with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.

识别结果图片

完整步骤记录

#进入cmd
C:\Users\ike>d:D:\>cd Scan-OCRD:\Scan-OCR>dir驱动器 D 中的卷没有标签。卷的序列号是 E2FB-2A10D:\Scan-OCR 的目录2019/04/04  16:56    <DIR>          .
2019/04/04  16:56    <DIR>          ..
2019/04/04  13:51                14 font_properties
2019/04/04  13:13    <DIR>          num-1
2019/04/04  13:10    <DIR>          num-2
2019/04/04  16:38    <DIR>          num-3
2019/04/04  13:18             1,212 num.font.exp1.box
2019/04/04  13:13           328,713 num.font.exp1.tif
2019/04/04  13:21           140,165 num.font.exp1.tr
2019/04/04  13:19               989 num.font.exp2.box
2019/04/04  13:14           281,499 num.font.exp2.tif
2019/04/04  13:21           126,149 num.font.exp2.tr
2019/04/04  16:38             1,464 num.font.exp3.box
2019/04/04  16:38           369,277 num.font.exp3.tif
2019/04/04  16:40           171,518 num.font.exp3.tr
2019/04/04  16:51           110,676 num.inttemp
2019/04/04  16:53             1,743 num.normproto
2019/04/04  16:51                40 num.pffmtable
2019/04/04  16:51                 4 num.shapetable
2019/04/04  16:56           112,753 num.traineddata
2019/04/04  16:51               150 num.unicharset16 个文件      1,646,366 字节5 个目录 801,806,598,144 可用字节#生成box文件
D:\Scan-OCR>tesseract num.font.exp3.tif num.font.exp3 batch.nochop makebox
Tesseract Open Source OCR Engine v3.05.02 with Leptonica
Page 1
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 2
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 3
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 4
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 5
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 6
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 7
Warning. Invalid resolution 1 dpi. Using 70 instead.#使用JTessBoxEditor调整tif文件错误#生成tr文件
D:\Scan-OCR>tesseract num.font.exp3.tif num.font.exp3 nobatch box.train
Tesseract Open Source OCR Engine v3.05.02 with Leptonica
Page 1
Warning. Invalid resolution 1 dpi. Using 70 instead.
row xheight=78.5, but median xheight = 11.5
APPLY_BOXES:Boxes read from boxfile:      13Found 13 good blobs.
Generated training data for 2 words
Page 2
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile:      12Found 12 good blobs.
Generated training data for 1 words
Page 3
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile:      12Found 12 good blobs.
Generated training data for 2 words
Page 4
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile:      12Found 12 good blobs.
Generated training data for 1 words
Page 5
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile:       6Found 6 good blobs.
Generated training data for 1 words
Page 6
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile:       6Found 6 good blobs.
Generated training data for 1 words
Page 7
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile:      14Found 14 good blobs.
Generated training data for 3 words#从所有文件中提取字符
D:\Scan-OCR>unicharset_extractor num.font.exp1.box num.font.exp2.box num.font.exp3.box
Extracting unicharset from num.font.exp1.box
Extracting unicharset from num.font.exp2.box
Extracting unicharset from num.font.exp3.box
Wrote unicharset file ./unicharset.#生成字体特征文件
D:\Scan-OCR>shapeclustering -F font_properties -U unicharset num.font.exp1.tr num.font.exp2.tr num.font.exp3.tr
Reading num.font.exp1.tr ...
Reading num.font.exp2.tr ...
Reading num.font.exp3.tr ...
Bad properties for index 3, char 3: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 4, char 0: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 5, char 9: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 6, char 4: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 7, char 1: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 8, char 2: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 9, char 8: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 10, char 6: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 11, char 7: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 12, char 5: 0,255 0,255 0,0 0,0 0,0
Building master shape table
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0 1 2 3 4 5 6 7 8 9
Stopped with 0 merged, min dist 0.087838
Master shape_table:Number of shapes = 10 max unichars = 1 number with multiple unichars = 0D:\Scan-OCR>mftraining -F font_properties -U unicharset -O unicharset num.font.exp1.tr num.font.exp2.tr num.font.exp3.tr
Read shape table shapetable of 10 shapes
Reading num.font.exp1.tr ...
Reading num.font.exp2.tr ...
Reading num.font.exp3.tr ...
Bad properties for index 3, char 3: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 4, char 0: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 5, char 9: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 6, char 4: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 7, char 1: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 8, char 2: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 9, char 8: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 10, char 6: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 11, char 7: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 12, char 5: 0,255 0,255 0,0 0,0 0,0
Warning: no protos/configs for Joined in CreateIntTemplates()
Warning: no protos/configs for |Broken|0|1 in CreateIntTemplates()
Done!#合并所有tr文件
D:\Scan-OCR>cntraining num.font.exp1.tr num.font.exp2.tr num.font.exp3.tr
Reading num.font.exp1.tr ...
Reading num.font.exp2.tr ...
Reading num.font.exp3.tr ...
Clustering ...Writing normproto ...#改名,所有后生成的文件加前缀。
#现在需要做的是把训练过程创建的五个文件:shapetable,normproto,inttemp,pffmtable,unicharset,用lang.为前缀重命名(例如cont.)#生成训练结果
D:\Scan-OCR>combine_tessdata num.
Combining tessdata files
TessdataManager combined tesseract data files.
Offset for type  0 (num.config                ) is -1
Offset for type  1 (num.unicharset            ) is 140
Offset for type  2 (num.unicharambigs         ) is -1
Offset for type  3 (num.inttemp               ) is 290
Offset for type  4 (num.pffmtable             ) is 110966
Offset for type  5 (num.normproto             ) is 111006
Offset for type  6 (num.punc-dawg             ) is -1
Offset for type  7 (num.word-dawg             ) is -1
Offset for type  8 (num.number-dawg           ) is -1
Offset for type  9 (num.freq-dawg             ) is -1
Offset for type 10 (num.fixed-length-dawgs    ) is -1
Offset for type 11 (num.cube-unicharset       ) is -1
Offset for type 12 (num.cube-word-dawg        ) is -1
Offset for type 13 (num.shapetable            ) is 112749
Offset for type 14 (num.bigram-dawg           ) is -1
Offset for type 15 (num.unambig-dawg          ) is -1
Offset for type 16 (num.params-model          ) is -1
Output num.traineddata created successfully.