最近项目中有个需求,使用手持设备对3C码进行拍照识别,最后决定使用Tesseract-OCR,刚才对这个不了解,网上一大堆帖子,按照步骤操作下来,要么报错,要么就是标题党,实在是很恶心。为了以后可能还是用到,特意记录下来。
我的环境
-
Windows10
-
JDK1.8
-
Tesseract-OCR-3.0.5
下载地址:https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-3.05.02-20180621.exe
-
jTessBoxEditor-2.2.0
下载地址: https://sourceforge.net/projects/vietocr/files/jTessBoxEditor
软件安装
Tesseract-OCR-3.0.5
安装比较简单,双击直接安装即可,也不需要更改目录。安装完毕后需要配置环境变量。
TESSDATA_PREFIX
C:\Program Files (x86)\Tesseract-OCR\tessdata
path
C:\Program Files (x86)\Tesseract-OCR
TESSDATA_PREFIX 配置:
path 配置
jTessBoxEditor-2.2.0 安装
这个安装直接解压即可,这个软件需要有JAVA的环境,有关JAVA的安装和环境变量配置比较简单,这里就不介绍了,注意如果没有java环境这个软件是运行不起来的。
具体步骤
下面这张图是我已经完成的截图,其中num.traineddata,这个文件就是最后生成的训练文件。
num-1、num-2、num-3文件夹是存放的需要学习的图片,我为什么分了3个文件夹呢?是这样,为了提高我们日后识别的准确率,所以这个训练的过程是持续的,num-1表示第一次训练学习的数据,num-2表示第二次训练学习的数据,依次类推,这样的好处就是,省去了上次学习训练的重新校对box文件。我们只需要把本次需要训练学习的数据生成tif,在生成box文件进行对本次的内容进行校对即可,然后后续合并结果,生成最终的训练好的文件来使用。
准备训练数据
由于项目的识别的内容都是数字,所以我准备的训练数据都是带有数字的图片
num-1文件夹训练数据内容
num-2文件夹训练数据内容
num-3文件夹训练数据内容
准备好数据目录结构
生成tif文件
使用jTessBoxEditor工具生成tif文件,为了方便后续操作,将生成好的tif文件保持到Scan-OCR目录下,解压jTessBoxEditor压缩包后进入双击train.bat即可运行。
打开jTessBoxEditor工具后,点击Tools,点击Merge TIFF,选中num-1文件夹中所有图片,点击打开。
调整保存目录,保存名为:num.font.exp1.tif 然后点击保存。
点击保存后,提示完成了num.font.exp1.tif文件的创建。在Scan-OCR目录下可以看到刚创建的文件
上面已经完成了num-1文件夹中训练数据tif文件的创建,num-2、num-3文件训练数据创建tif重复上面步骤即可
完成3个文件夹创建trf文件的目录结构:
生成bok文件
通过cmd命令的方式进行生成3个box文件,命令:
tesseract num.font.exp1.tif num.font.exp1 batch.nochop makebox
tesseract num.font.exp2.tif num.font.exp2 batch.nochop makebox
tesseract num.font.exp3.tif num.font.exp3 batch.nochop makebox
执行命令过程:
D:\Scan-OCR>tesseract num.font.exp1.tif num.font.exp1 batch.nochop makebox
Tesseract Open Source OCR Engine v3.05.02 with Leptonica
Page 1
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 2
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 3
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 4
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 5
Warning. Invalid resolution 1 dpi. Using 70 instead.D:\Scan-OCR>tesseract num.font.exp2.tif num.font.exp2 batch.nochop makebox
Tesseract Open Source OCR Engine v3.05.02 with Leptonica
Page 1
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 2
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 3
Warning. Invalid resolution 1 dpi. Using 70 instead.D:\Scan-OCR>tesseract num.font.exp3.tif num.font.exp3 batch.nochop makebox
Tesseract Open Source OCR Engine v3.05.02 with Leptonica
Page 1
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 2
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 3
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 4
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 5
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 6
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 7
Warning. Invalid resolution 1 dpi. Using 70 instead.D:\Scan-OCR>
执行命令后,会生成3个box文件
字符和位置进行校正
使用jTessBoxEditor工具打开每个tif进行字符和位置校正,然后保存即可
图中:
char 表示识别的字符
x y width height 表示字符的位置信息,我们微调的内容
1)字符是否识别正确
2)字符位置信息是否正确(比如图中字符2,char对应的字符是正确的,但是位置信息不正确,经过调整,如下:)
如果需要调整,我们调整后需要保存一下,注意每张被训练的图片,调整后的信息都是存放在对应的bok文件中的。感兴趣的,可以打开看看。
这个微调的过程很枯燥,都是重复性的工作,慢慢的调整完所有的图片后,保存就可以进行下一步操作。
生成TR文件
tesseract num.font.exp1.tif num.font.exp1 nobatch box.train
tesseract num.font.exp2.tif num.font.exp2 nobatch box.train
tesseract num.font.exp3.tif num.font.exp3 nobatch box.train
执行命令过程:
D:\Scan-OCR>tesseract num.font.exp1.tif num.font.exp1 nobatch box.train
Tesseract Open Source OCR Engine v3.05.02 with Leptonica
Page 1
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile: 11Found 11 good blobs.
Generated training data for 2 words
Page 2
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile: 13Found 13 good blobs.
Generated training data for 3 words
Page 3
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile: 12Found 12 good blobs.
Generated training data for 1 words
Page 4
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile: 12Found 12 good blobs.
Generated training data for 1 words
Page 5
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile: 12Found 12 good blobs.
Generated training data for 3 wordsD:\Scan-OCR>tesseract num.font.exp2.tif num.font.exp2 nobatch box.train
Tesseract Open Source OCR Engine v3.05.02 with Leptonica
Page 1
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile: 12Found 12 good blobs.
Generated training data for 2 words
Page 2
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile: 12Found 12 good blobs.
Generated training data for 1 words
Page 3
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile: 12Found 12 good blobs.
Generated training data for 1 wordsD:\Scan-OCR>tesseract num.font.exp3.tif num.font.exp3 nobatch box.train
Tesseract Open Source OCR Engine v3.05.02 with Leptonica
Page 1
Warning. Invalid resolution 1 dpi. Using 70 instead.
row xheight=78.5, but median xheight = 11.5
APPLY_BOXES:Boxes read from boxfile: 13Found 13 good blobs.
Generated training data for 2 words
Page 2
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile: 12Found 12 good blobs.
Generated training data for 1 words
Page 3
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile: 12Found 12 good blobs.
Generated training data for 2 words
Page 4
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile: 12Found 12 good blobs.
Generated training data for 1 words
Page 5
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile: 6Found 6 good blobs.
Generated training data for 1 words
Page 6
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile: 6Found 6 good blobs.
Generated training data for 1 words
Page 7
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile: 14Found 14 good blobs.
Generated training data for 3 wordsD:\Scan-OCR>
执行命令后,会生成3个tr文件
新建字体特征文件
创建一个名称为font_properties的字体特征文件。文件内容格式:
其中fontname为字体名称,必须与[lang].[fontname].exp[num].box中的名称保持一致。
、 、 、、 的取值为1或0,表示字体是否具有这些属性。
在Scan-OCR目录下创建一个名称为font_properties的文件,用记事本打开,输入以下下内容:
font 0 0 0 0 0
这里全取值为0,表示字体不是粗体、斜体等等。注意font_properties文件是没有拓展名的
从所有文件中提取字符
输入命令,生成unicharset文件
D:\Scan-OCR>unicharset_extractor num.font.exp1.box num.font.exp2.box num.font.exp3.box
Extracting unicharset from num.font.exp1.box
Extracting unicharset from num.font.exp2.box
Extracting unicharset from num.font.exp3.box
Wrote unicharset file ./unicharset.
生成shape文件
输入命令生成shapetable文件:
D:\Scan-OCR>shapeclustering -F font_properties -U unicharset num.font.exp1.tr num.font.exp2.tr num.font.exp3.tr
Reading num.font.exp1.tr ...
Reading num.font.exp2.tr ...
Reading num.font.exp3.tr ...
Bad properties for index 3, char 3: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 4, char 0: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 5, char 9: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 6, char 4: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 7, char 1: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 8, char 2: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 9, char 8: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 10, char 6: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 11, char 7: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 12, char 5: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 13, char ?: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 14, char F: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 15, char 垄: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 16, char ~: 0,255 0,255 0,0 0,0 0,0
Building master shape table
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0 1 2 3 4 5 6 7 8 9 10 11 12 13
Stopped with 0 merged, min dist 0.100629
Master shape_table:Number of shapes = 14 max unichars = 1 number with multiple unichars = 0
生成聚集字符特征文件
D:\Scan-OCR>mftraining -F font_properties -U unicharset -O unicharset num.font.exp1.tr num.font.exp2.tr num.font.exp3.tr
Read shape table shapetable of 14 shapes
Reading num.font.exp1.tr ...
Reading num.font.exp2.tr ...
Reading num.font.exp3.tr ...
Bad properties for index 3, char 3: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 4, char 0: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 5, char 9: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 6, char 4: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 7, char 1: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 8, char 2: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 9, char 8: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 10, char 6: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 11, char 7: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 12, char 5: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 13, char ?: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 14, char F: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 15, char 垄: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 16, char ~: 0,255 0,255 0,0 0,0 0,0
Warning: no protos/configs for Joined in CreateIntTemplates()
Warning: no protos/configs for |Broken|0|1 in CreateIntTemplates()
Done!
合并所有tr文件
D:\Scan-OCR>cntraining num.font.exp1.tr num.font.exp2.tr num.font.exp3.tr
Reading num.font.exp1.tr ...
Reading num.font.exp2.tr ...
Reading num.font.exp3.tr ...
Clustering ...Writing normproto ...
修改文件名
所有后生成的文件加前缀。
把训练过程创建的五个文件:shapetable,normproto,inttemp,pffmtable,unicharset,用lang.为前缀重命名(例如num.)
生成训练结果
D:\Scan-OCR>combine_tessdata num.
Combining tessdata files
TessdataManager combined tesseract data files.
Offset for type 0 (num.config ) is -1
Offset for type 1 (num.unicharset ) is 140
Offset for type 2 (num.unicharambigs ) is -1
Offset for type 3 (num.inttemp ) is 1033
Offset for type 4 (num.pffmtable ) is 145113
Offset for type 5 (num.normproto ) is 145252
Offset for type 6 (num.punc-dawg ) is -1
Offset for type 7 (num.word-dawg ) is -1
Offset for type 8 (num.number-dawg ) is -1
Offset for type 9 (num.freq-dawg ) is -1
Offset for type 10 (num.fixed-length-dawgs ) is -1
Offset for type 11 (num.cube-unicharset ) is -1
Offset for type 12 (num.cube-word-dawg ) is -1
Offset for type 13 (num.shapetable ) is 147115
Offset for type 14 (num.bigram-dawg ) is -1
Offset for type 15 (num.unambig-dawg ) is -1
Offset for type 16 (num.params-model ) is -1
Output num.traineddata created successfully.
截至到这里,我们训练结束了,Scan-OCR目录生成训练文件:
测试
将num.traineddata文件拷贝到tessdata目录下,可以进行测试。
D:\Scan-OCR\num-1>tesseract 1.png result -l num
Tesseract Open Source OCR Engine v3.05.02 with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
完整步骤记录
#进入cmd
C:\Users\ike>d:D:\>cd Scan-OCRD:\Scan-OCR>dir驱动器 D 中的卷没有标签。卷的序列号是 E2FB-2A10D:\Scan-OCR 的目录2019/04/04 16:56 <DIR> .
2019/04/04 16:56 <DIR> ..
2019/04/04 13:51 14 font_properties
2019/04/04 13:13 <DIR> num-1
2019/04/04 13:10 <DIR> num-2
2019/04/04 16:38 <DIR> num-3
2019/04/04 13:18 1,212 num.font.exp1.box
2019/04/04 13:13 328,713 num.font.exp1.tif
2019/04/04 13:21 140,165 num.font.exp1.tr
2019/04/04 13:19 989 num.font.exp2.box
2019/04/04 13:14 281,499 num.font.exp2.tif
2019/04/04 13:21 126,149 num.font.exp2.tr
2019/04/04 16:38 1,464 num.font.exp3.box
2019/04/04 16:38 369,277 num.font.exp3.tif
2019/04/04 16:40 171,518 num.font.exp3.tr
2019/04/04 16:51 110,676 num.inttemp
2019/04/04 16:53 1,743 num.normproto
2019/04/04 16:51 40 num.pffmtable
2019/04/04 16:51 4 num.shapetable
2019/04/04 16:56 112,753 num.traineddata
2019/04/04 16:51 150 num.unicharset16 个文件 1,646,366 字节5 个目录 801,806,598,144 可用字节#生成box文件
D:\Scan-OCR>tesseract num.font.exp3.tif num.font.exp3 batch.nochop makebox
Tesseract Open Source OCR Engine v3.05.02 with Leptonica
Page 1
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 2
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 3
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 4
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 5
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 6
Warning. Invalid resolution 1 dpi. Using 70 instead.
Page 7
Warning. Invalid resolution 1 dpi. Using 70 instead.#使用JTessBoxEditor调整tif文件错误#生成tr文件
D:\Scan-OCR>tesseract num.font.exp3.tif num.font.exp3 nobatch box.train
Tesseract Open Source OCR Engine v3.05.02 with Leptonica
Page 1
Warning. Invalid resolution 1 dpi. Using 70 instead.
row xheight=78.5, but median xheight = 11.5
APPLY_BOXES:Boxes read from boxfile: 13Found 13 good blobs.
Generated training data for 2 words
Page 2
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile: 12Found 12 good blobs.
Generated training data for 1 words
Page 3
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile: 12Found 12 good blobs.
Generated training data for 2 words
Page 4
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile: 12Found 12 good blobs.
Generated training data for 1 words
Page 5
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile: 6Found 6 good blobs.
Generated training data for 1 words
Page 6
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile: 6Found 6 good blobs.
Generated training data for 1 words
Page 7
Warning. Invalid resolution 1 dpi. Using 70 instead.
APPLY_BOXES:Boxes read from boxfile: 14Found 14 good blobs.
Generated training data for 3 words#从所有文件中提取字符
D:\Scan-OCR>unicharset_extractor num.font.exp1.box num.font.exp2.box num.font.exp3.box
Extracting unicharset from num.font.exp1.box
Extracting unicharset from num.font.exp2.box
Extracting unicharset from num.font.exp3.box
Wrote unicharset file ./unicharset.#生成字体特征文件
D:\Scan-OCR>shapeclustering -F font_properties -U unicharset num.font.exp1.tr num.font.exp2.tr num.font.exp3.tr
Reading num.font.exp1.tr ...
Reading num.font.exp2.tr ...
Reading num.font.exp3.tr ...
Bad properties for index 3, char 3: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 4, char 0: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 5, char 9: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 6, char 4: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 7, char 1: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 8, char 2: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 9, char 8: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 10, char 6: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 11, char 7: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 12, char 5: 0,255 0,255 0,0 0,0 0,0
Building master shape table
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0 1 2 3 4 5 6 7 8 9
Stopped with 0 merged, min dist 0.087838
Master shape_table:Number of shapes = 10 max unichars = 1 number with multiple unichars = 0D:\Scan-OCR>mftraining -F font_properties -U unicharset -O unicharset num.font.exp1.tr num.font.exp2.tr num.font.exp3.tr
Read shape table shapetable of 10 shapes
Reading num.font.exp1.tr ...
Reading num.font.exp2.tr ...
Reading num.font.exp3.tr ...
Bad properties for index 3, char 3: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 4, char 0: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 5, char 9: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 6, char 4: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 7, char 1: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 8, char 2: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 9, char 8: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 10, char 6: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 11, char 7: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 12, char 5: 0,255 0,255 0,0 0,0 0,0
Warning: no protos/configs for Joined in CreateIntTemplates()
Warning: no protos/configs for |Broken|0|1 in CreateIntTemplates()
Done!#合并所有tr文件
D:\Scan-OCR>cntraining num.font.exp1.tr num.font.exp2.tr num.font.exp3.tr
Reading num.font.exp1.tr ...
Reading num.font.exp2.tr ...
Reading num.font.exp3.tr ...
Clustering ...Writing normproto ...#改名,所有后生成的文件加前缀。
#现在需要做的是把训练过程创建的五个文件:shapetable,normproto,inttemp,pffmtable,unicharset,用lang.为前缀重命名(例如cont.)#生成训练结果
D:\Scan-OCR>combine_tessdata num.
Combining tessdata files
TessdataManager combined tesseract data files.
Offset for type 0 (num.config ) is -1
Offset for type 1 (num.unicharset ) is 140
Offset for type 2 (num.unicharambigs ) is -1
Offset for type 3 (num.inttemp ) is 290
Offset for type 4 (num.pffmtable ) is 110966
Offset for type 5 (num.normproto ) is 111006
Offset for type 6 (num.punc-dawg ) is -1
Offset for type 7 (num.word-dawg ) is -1
Offset for type 8 (num.number-dawg ) is -1
Offset for type 9 (num.freq-dawg ) is -1
Offset for type 10 (num.fixed-length-dawgs ) is -1
Offset for type 11 (num.cube-unicharset ) is -1
Offset for type 12 (num.cube-word-dawg ) is -1
Offset for type 13 (num.shapetable ) is 112749
Offset for type 14 (num.bigram-dawg ) is -1
Offset for type 15 (num.unambig-dawg ) is -1
Offset for type 16 (num.params-model ) is -1
Output num.traineddata created successfully.