2013-06-13 80 views
0

我想讀取格式化文本作爲html文本(< html> < b> boldvalue < b> < img src「link」> </html>)我想要使用圖像標記鏈接獲取圖像。我使用poi poi是否有任何選項以html格式獲取這樣的數據?如何使用poi將ms格式(.doc)中的格式化文本讀取爲html文本?

+1

http://stackoverflow.com/questions/7868713/convert-word-to-html-with-apache-poi - 複製 – Jayan

+0

比我如何能夠從圖像標籤獲取圖像 – user25226

+0

圖像標籤附帶評論線也是CSS來在一個類,但我希望與標籤中的CSS像

。如何得到這個 – user25226

回答

1

試試這個

HWPFDocumentCore wordDocument = WordToHtmlUtils.loadDoc(new FileInputStream("D:\\temp\\seo\\1.doc")); 

     WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
       DocumentBuilderFactory.newInstance().newDocumentBuilder() 
         .newDocument()); 
     wordToHtmlConverter.processDocument(wordDocument); 
     Document htmlDocument = wordToHtmlConverter.getDocument(); 
     ByteArrayOutputStream out = new ByteArrayOutputStream(); 
     DOMSource domSource = new DOMSource(htmlDocument); 
     StreamResult streamResult = new StreamResult(out); 

     TransformerFactory tf = TransformerFactory.newInstance(); 
     Transformer serializer = tf.newTransformer(); 
     serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); 
     serializer.setOutputProperty(OutputKeys.INDENT, "yes"); 
     serializer.setOutputProperty(OutputKeys.METHOD, "html"); 
     serializer.transform(domSource, streamResult); 
     out.close(); 

     String result = new String(out.toByteArray()); 
     System.out.println(result); 
+2

這是複製逐字從複製嗎? – Jayan

+0

它返回的html值,但風格來作爲類,而不是我如何得到這樣的標籤()標籤的樣式。 – user25226

+0

如何獲取圖像標籤和圖像 – user25226