解决“UnicodeDecodeError: ‘gbk‘ codec can‘t decode byte 0xd0 in position 493: illegal multibyte sequen“_综合

解决"UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0xd0 in position 493: illegal multibyte sequen"

下面展示一些 内联代码片。

import codecs,sys# 读取文件内容
def getContent(fullname):f = codecs.open(fullname, 'r')content = f.readline()f.close()return content

先debug一下发现，第十五个文件的编码方式是’cp936’
发现错误
因为本实验导入的评论文件数量较多，单个文件重要性不大，只要在读取的过程中忽略这些文件既可。

def getContent(fullname):f = codecs.open(fullname, 'r', encoding='gbk', errors='ignore')content = f.readline()f.close()return content

运行之后发现，有四个文件编码方式不同，问题解决
运行结果

至此，本情况只对于处理文件数量较多，单个文件重要性相对不大的情况。如果你想把未读取的文件再写入，可以根据他们的编码方式重新写一个方法。

或者找到编码方式不同的txt文件，然后另存为下面有选择编码方式为UTF-8。