3
我想轉換包含一些圖像的文檔.doc
。如何將它轉換爲*.html
,這樣圖像將保持相同的位置?如何將這些圖像存儲在名爲image
的單獨文件夾中,並將此文件夾用作圖像源?使用Apache POI在Java中將.doc轉換爲.html
我的代碼:
import java.io.BufferedWriter;
import java.io.DataOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.StringWriter;
import javax.swing.JEditorPane;
import javax.swing.JFrame;
import javax.swing.JScrollPane;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.converter.WordToHtmlConverter;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.xwpf.converter.core.FileImageExtractor;
import org.apache.poi.xwpf.converter.core.FileURIResolver;
import org.apache.poi.xwpf.converter.xhtml.XHTMLOptions;
import org.w3c.dom.Document;
public class TestWordToHtmlConverter {
private File docFile;
private File file;
public TestWordToHtmlConverter(File docFile) {
this.docFile = docFile;
}
public void convert(File file) {
this.file = file;
try {
FileInputStream finStream=new FileInputStream(docFile.getAbsolutePath());
HWPFDocument doc=new HWPFDocument(finStream);
WordExtractor wordExtract=new WordExtractor(doc);
Document newDocument = DocumentBuilderFactory.newInstance() .newDocumentBuilder().newDocument();
WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(newDocument) ;
wordToHtmlConverter.processDocument(doc);
StringWriter stringWriter = new StringWriter();
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
transformer.setOutputProperty(OutputKeys.METHOD, "html");
transformer.transform(new DOMSource(wordToHtmlConverter.getDocument()), new StreamResult(stringWriter));
String html = stringWriter.toString();
FileOutputStream fos=new FileOutputStream(new File("html/sample.html"));
DataOutputStream dos;
try {
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(fos,"UTF-8"));
out.write(html);
out.close();
}
catch (IOException e) {
e.printStackTrace();
}
/*JEditorPane editorPane = new JEditorPane();
editorPane.setContentType("text/html");
editorPane.setEditable(false);
editorPane.setPage(file.toURI().toURL());
JScrollPane scrollPane = new JScrollPane(editorPane);
JFrame f = new JFrame("Display Html File");
f.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
f.getContentPane().add(scrollPane);
f.setSize(512, 342);
f.setVisible(true);*/
} catch(Exception e) {
e.printStackTrace();
}
}
public static void main(String args[]) {
TestWordToHtmlConverter TTC=new TestWordToHtmlConverter(new File("docx/sample.doc"));
TTC.convert(TTC.docFile);
}
}
此實現不創建圖片或鏈接到他們。這可以 通過重寫AbstractWordConverter.processImage(元素, 布爾,照片)方法來改變
謝謝...現在我得到了解決方案 – sudhakar810
不客氣。 –
現在我得到了解決方案的圖像和它的正常工作。但是存在與子彈和編號有關的問題。包含列表的段落可以正確顯示。缺少編號。 – sudhakar810