算法原理不在赘述,请参考:
朴素贝叶斯分类算法
将代码保存为.py格式,默认使用的数据是代码文件所在目录下data目录下的 bayes_train.txt 和bayes_test.txt 两个文件分别作为训练样例和测试样例。以上参数可以在源代码中修改,也可以使用命令行参数传入,参考以下启动方式:
python bayes.py bayes_train.txt bayes_test.txt命令中后两个参数为别为训练集和测试集合的途径。
python源代码如下:
__author__ = 'Administrator'
import re
import sysDataLength = 100
Attr_num = 10
Val_num = 5
tr_data = []
test_data = []
tr_lg = ts_lg = 0
attrs = [0 for i in range(DataLength)]
wd = 0 ### the number of attributes, included category{yes,no}
values = [set() for i in range(2)]val_ls =[]
pro_p = [[0 for i in range(Val_num)] for j in range(Attr_num)]
pro_n = [[0 for i in range(Val_num)] for j in range(Attr_n