问题描述:使用poi将.doc转换成html时,发现其将修订前的内容也显示出来了,(另外html的标题无法控制,有时候会出现乱码)
操作如下
1编辑测试文档:修订状态(附件中可以下载该doc文档)
2期望显示效果:最终状态
3转换成html的效果:在修订中删除的内容会显示出来(html的title显示内容不是我想要的)
实现代码
public class Word2Html {
public static void main(String argv[]) {
try {
//word 路径 html输出路径
convert2Html("D:/doctohtml/1.doc","D:/doctohtml/1.html");
} catch (Exception e) {
e.printStackTrace();
}
}
public static void writeFile(String content, String path) {
FileOutputStream fos = null;
BufferedWriter bw = null;
try {
File file = new File(path);
fos = new FileOutputStream(file);
bw = new BufferedWriter(new OutputStreamWriter(fos,"utf-8"));
bw.write(content);
} catch (FileNotFoundException fnfe) {
fnfe.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
} finally {
try {
if (bw != null)
bw.close();
if (fos != null)
fos.close();
} catch (IOException ie) {
}
}
}
public static void convert2Html(String fileName, String outPutFile)
throws TransformerException, IOException,
ParserConfigurationException {
HWPFDocument wordDocument = new HWPFDocument(new FileInputStream(fileName));//WordToHtmlUtils.loadDoc(new FileInputStream(inputFile));
WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
DocumentBuilderFactory.newInstance().newDocumentBuilder()
.newDocument());
wordToHtmlConverter.setPicturesManager( new PicturesManager()
{
public String savePicture( byte[] content,
PictureType pictureType, String suggestedName,
float widthInches, float heightInches )
{
//html 中 图片标签中 显示的图片路路径 <img src="d:/test/0.jpg"/>
return "d:/doctohtml/"+suggestedName;
}
} );
wordToHtmlConverter.processDocument(wordDocument);
//save pictures
List pics=wordDocument.getPicturesTable().getAllPictures();
if(pics!=null){
for(int i=0;i<pics.size();i++){
Picture pic = (Picture)pics.get(i);
System.out.println();
try {
//word中图片的存储路径
pic.writeImageContent(new FileOutputStream("D:/doctohtml/"
+ pic.suggestFullFileName()));
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
}
Document htmlDocument = wordToHtmlConverter.getDocument();
ByteArrayOutputStream out = new ByteArrayOutputStream();
DOMSource domSource = new DOMSource(htmlDocument);
StreamResult streamResult = new StreamResult(out);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer serializer = tf.newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
serializer.setOutputProperty(OutputKeys.METHOD, "html");
serializer.transform(domSource, streamResult);
out.close();
writeFile(new String(out.toByteArray()), outPutFile);
}
}
doc文件的下载链接(在最下面):http://www.iteye.com/problems/113815
------解决思路----------------------
跟踪源码看看。