2013-01-22 52 views
0

我需要將.docx文件內容轉換爲HTML文本才能在web ui中顯示。無法使用Java將docx轉換爲html

我使用Apache的POIXWPFDocument類,但一直沒能得到任何結果,但; 獲得空字符串。我的代碼基於this sample

這裏也是我的代碼:

public JSONObject uploadDocxFile(MultipartFile multipartFile) throws Exception { 
     InputStream inputStream = multipartFile.getInputStream(); 
     XWPFDocument wordDocument = new XWPFDocument(inputStream); 

     WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument()); 
     org.w3c.dom.Document htmlDocument = wordToHtmlConverter.getDocument(); 
     ByteArrayOutputStream out = new ByteArrayOutputStream(); 
     DOMSource domSource = new DOMSource(htmlDocument); 
     StringWriter stringWriter = new StringWriter(); 

     TransformerFactory tf = TransformerFactory.newInstance(); 
     Transformer serializer = tf.newTransformer(); 
     serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); 
     serializer.setOutputProperty(OutputKeys.INDENT, "yes"); 
     serializer.setOutputProperty(OutputKeys.METHOD, "html"); 
     serializer.transform(domSource, new StreamResult(stringWriter)); 
     out.close(); 

     String result = new String(out.toByteArray()); 
     String htmlText = result; 

     JSONObject jsonObject = new JSONObject(); 
     jsonObject.put("content", htmlText); 
     jsonObject.put("success", true); 
     return jsonObject; 
    } 
+0

可能的重複[使用Apache POI將.docx轉換爲html並獲取文本](http://stackoverflow.com/questions/13103421/converting-a-docx-to-html-using-apache- poi-and-getting-no-text) –

+0

有沒有適當的答案在那裏..問題的所有者以同樣的理由與我打開這個問題;但他補充說,他在獲取文本時沒有問題。 – talha06

回答

0

我使用docx4j做到這一點,它似乎是工作。如果您使用的是Maven,則只需add the dependency(但使用3.0.0版),然後使用名爲ConvertOutHtml.javadocx4j sample programs之一。只需更改ConvertOutHtml.java中的文件路徑即可指向您的文件,您應該沒問題。

1

即使爲時已晚,我認爲,以前的代碼可以通過這種方式(它的工作原理與WORD97文件)

private static void convertWordDoc2HTML(File file) 
    throws ParserConfigurationException, TransformerConfigurationException,TransformerException, IOException {  
    //change the type from XWPFDocument to HWPFDocument 
    HWPFDocument hwpfDocument = null; 
    try { 
     FileInputStream fis = new FileInputStream(file); 
     POIFSFileSystem fileSystem = new POIFSFileSystem(fis);   
      hwpfDocument = new HWPFDocument(fileSystem); 

    } catch (IOException ex) { 
     ex.printStackTrace(); 
    } 

    WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument()); 
    org.w3c.dom.Document htmlDocument = wordToHtmlConverter.getDocument(); 
    //add processDocument method 
    wordToHtmlConverter.processDocument(hwpfDocument); 
    ByteArrayOutputStream out = new ByteArrayOutputStream(); 
    DOMSource domSource = new DOMSource(htmlDocument); 
    StreamResult streamResult = new StreamResult(out); 

    TransformerFactory tf = TransformerFactory.newInstance(); 
    Transformer serializer = tf.newTransformer(); 
    serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); 
    serializer.setOutputProperty(OutputKeys.INDENT, "yes"); 
    serializer.setOutputProperty(OutputKeys.METHOD, "html"); 
    serializer.transform(domSource, streamResult); 
    out.close(); 

    String result = new String(out.toByteArray()); 

    String htmlText = result; 
    System.out.println(htmlText); 

    } 

我希望它可以是有用的修改。

0

您的代碼正在生成一個空的html輸出,因爲您沒有處理轉換器中的任何文檔。

無論如何,如果它是一個docx你應該使用XHTMLConverter將其轉換爲HTML而不是WordToHtmlConverter。請參閱this answer