2011-11-23 68 views
0

我想用WordToHtmlConverter類轉換HTML中的word文檔,但文檔不清晰。在Apache POI中使用WordToHtmlConverter轉換器

該WordToHtmlConverter有一個構造函數採取org.w3c.dom.Document,但我不認爲這是word文檔。

有沒有人有關於如何加載word文檔並將其轉換爲html的示例程序。

+0

它已要求。請。檢查http://stackoverflow.com/questions/227236/convert-word-doc-to-html-programmatically-in-java –

回答

4

現在最好的選擇可能是看單元測試,例如TestWordToHtmlConverter。這將告訴你如何去做

一般來說,你傳入要填充的xml文檔,讓WordToHtmlConverter從Word文檔生成HTML,然後將xml文檔轉換爲適當的輸出(indenting,new線等)

您的代碼會想看起來像:

Document newDocument = DocumentBuilderFactory.newInstance() 
      .newDocumentBuilder().newDocument(); 
    WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
      newDocument); 

    wordToHtmlConverter.processDocument(hwpfDocument); 

    StringWriter stringWriter = new StringWriter(); 
    Transformer transformer = TransformerFactory.newInstance() 
      .newTransformer(); 
    transformer.setOutputProperty(OutputKeys.INDENT, "yes"); 
    transformer.setOutputProperty(OutputKeys.ENCODING, "utf-8"); 
    transformer.setOutputProperty(OutputKeys.METHOD, "html"); 
    transformer.transform(
      new DOMSource(wordToHtmlConverter.getDocument()), 
      new StreamResult(stringWriter)); 

    String html = stringWriter.toString();