求助,如题。
我在度娘上找到了各种获取编码格式的办法,但是基本上都是获取文件的头三个字符,来确认,可是我的CSV文件是用流导出来的,没有那个识别符。
最坑爹是,我导出的是UTF-8的编码,如果用excel编辑以后,文件的编码格式变成了GBK。但是用mac自带的编辑器编辑,编码格式依然是UTF-8,因为这个原因,我读到的文件编码格式是不确定的,因此需要重新获取到文件的编码方式,跪求大神指导。
方法代码如下:
public ApiResult<String> uploadFileEmployees(@PathParam("corpUserId") String corpUserId,
@FormDataParam("operator") String operator,
@FormDataParam("salaryFile") FormDataBodyPart salaryFile) {
ApiResult<String> result = new ApiResult<>();
InputStream inputStream = null;
InputStreamReader isr = null;
OutputStreamWriter os = null;
BufferedReader bufr = null;
BufferedWriter bufw = null;
try {
if (StringUtils.isBlank(corpUserId)) {
logger.error("上传并创建员工账号,corpUserId为空{}", corpUserId);
return result.addError(new ApiError("CORPUSERID_NOT_FOUND", "corpUserId", corpUserId));
}
String fileName = salaryFile.getContentDisposition().getFileName();
if (StringUtils.isBlank(fileName)) {
logger.error("上传并创建员工账号,文件为空{}", corpUserId);
return result.addError(new ApiError("FILEINPUT_NOT_FOUND", "fileName", ""));
}
// 接收文件,临时保存
//为防止文件重复,创建多级目录
String root = FILE_BASE_PATH + File.separator + "uploadFileEmployees" ;
File file = new File(root);
//如果文件夹不存在则创建
if (!file.exists() && !file.isDirectory()) {
file.mkdirs();
}
String tmpFilePath = root + File.separator + fileName;
inputStream = salaryFile.getEntityAs(InputStream.class);
int p = (inputStream.read() << 8) + inputStream.read();
String code = null;
switch (p) {
case 0xefbb:
code = "UTF-8";
break;
case 0xfffe:
code = "Unicode";
break;
case 0xfeff:
code = "UTF-16BE";
break;
default:
code = "GBK";
}
System.out.println(code);
inputStream = salaryFile.getEntityAs(InputStream.class);
isr = new InputStreamReader(inputStream,code);
bufr = new BufferedReader(isr);
os = new OutputStreamWriter(new FileOutputStream(tmpFilePath),"UTF-8");
bufw = new BufferedWriter(os);
String s;
while((s=bufr.readLine())!=null){
bufw.write(s);
bufw.newLine();
}
bufw.flush();
os.flush();
logger.info("{} 为 {} 批量上传员工表 {} 为",
operator,corpUserId,fileName, tmpFilePath);
//读取并解析上传的员工信息csv文件,批量增加员工信息
List<Pair<CorpEmployee, String>> rejected = employmentBridge.batchAddEmployees(corpUserId, tmpFilePath);
if (rejected != null && !rejected.isEmpty()) {
// 有被拒绝的员工信息的情况
for (Pair<CorpEmployee, String> item : rejected) {
result.addError(item.getRight(), "EMPLOYEE_REJECTED", item.getLeft().toJSON());
}
}
return result;
} catch (EJBException ex) {
Throwable t = ExceptionUtils.unrollEJBException(ex);
logger.error("上传并创建员工账号遇到异常{}", corpUserId, t);
return result.addError(t.getMessage());
} catch (Exception e) {
logger.error("error", e);
return result.addError(new ApiError(e.getMessage(), "fileName", null));
}finally{
try {
bufw.close();
os.close();
} catch (IOException ex) {
java.util.logging.Logger.getLogger(SalaryPayResource.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
------解决思路----------------------
前几天刚做了一个 对文本文件分析编码方式以便正确转码
CodepageDetectorProxy detector = CodepageDetectorProxy.getInstance();
detector.add(UnicodeDetector.getInstance());
detector.add(JChardetFacade.getInstance());
detector.add(ASCIIDetector.getInstance());
File f = new File(url);
Charset charset = detector.detectCodepage(f.toURI().toURL());
//判断是否是UTF-8编码的文件
if("UTF-8".equals(charset.toString())){
br = new BufferedReader(new InputStreamReader(new FileInputStream(url),"UTF-8"));
} else {
br = new BufferedReader(new InputStreamReader(new FileInputStream(url),"GBK"));
}
可以判断的编码有不少 楼主可以输出试试看
cpdetector_1.0.10和 chardet (jchardet-1.1)这个是依赖jar包