2012-10-11 100 views
1

我試圖從數據庫中檢索docx,並嘗試通過檢查其內容來處理它。我認爲我的代碼檢索了我想要的文件,但似乎我沒有完全理解APACHE POI。我在堆棧跟蹤中遇到錯誤,說我錯了POI有什麼想法?使用APACHE POI處理docx文件

下面是如何加載文件:

public void loadFile(String FileName) 
{ 
    InputStream is = null; 
    try 
    { 
     //Connecting to MYSQL Database 
     Class.forName(driver).newInstance(); 
     con = DriverManager.getConnection(url+dbName,userName,password); 

     Statement stmt = (Statement) con.createStatement(); 
     ResultSet rs = stmt.executeQuery("SELECT FILE FROM doccompfiles WHERE FileName = '"+ FileName +"'"); 

     while(rs.next()) 
     { 
      is = rs.getBinaryStream("FILE"); 
     } 

     HWPFDocument doc = new HWPFDocument(is); 
     WordExtractor we = new WordExtractor(doc); 

     String[] paragraphs = we.getParagraphText(); 
     JOptionPane.showMessageDialog(null, "Number of Paragraphs" + paragraphs.length); 
     con.close(); 
    } 
    catch(Exception ex) 
    { 
     ex.printStackTrace(); 
    } 
} 

堆棧跟蹤:

org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF) 
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:131) 
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:104) 
at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:138) 
at org.apache.poi.hwpf.HWPFDocumentCore.verifyAndBuildPOIFS(HWPFDocumentCore.java:106) 
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:174) 
at documentComparisor.Database.loadFile(Database.java:156) 
at documentComparisor.Home$5.actionPerformed(Home.java:195) 
at javax.swing.AbstractButton.fireActionPerformed(Unknown Source) 
at javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source) 
at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source) 
at javax.swing.DefaultButtonModel.setPressed(Unknown Source) 
at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(Unknown Source) 
at java.awt.Component.processMouseEvent(Unknown Source) 
at javax.swing.JComponent.processMouseEvent(Unknown Source) 
at java.awt.Component.processEvent(Unknown Source) 
at java.awt.Container.processEvent(Unknown Source) 
at java.awt.Component.dispatchEventImpl(Unknown Source) 
at java.awt.Container.dispatchEventImpl(Unknown Source) 
at java.awt.Component.dispatchEvent(Unknown Source) 
at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source) 
at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source) 
at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source) 
at java.awt.Container.dispatchEventImpl(Unknown Source) 
at java.awt.Window.dispatchEventImpl(Unknown Source) 
at java.awt.Component.dispatchEvent(Unknown Source) 
at java.awt.EventQueue.dispatchEventImpl(Unknown Source) 
at java.awt.EventQueue.access$000(Unknown Source) 
at java.awt.EventQueue$3.run(Unknown Source) 
at java.awt.EventQueue$3.run(Unknown Source) 
at java.security.AccessController.doPrivileged(Native Method) 
at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Source) 
at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Source) 
at java.awt.EventQueue$4.run(Unknown Source) 
at java.awt.EventQueue$4.run(Unknown Source) 
at java.security.AccessController.doPrivileged(Native Method) 
at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Source) 
at java.awt.EventQueue.dispatchEvent(Unknown Source) 
at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source) 
at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source) 
at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source) 
at java.awt.EventDispatchThread.pumpEvents(Unknown Source) 
at java.awt.EventDispatchThread.pumpEvents(Unknown Source) 
at java.awt.EventDispatchThread.run(Unknown Source) 
+1

這是最有用的例外,我見過 –

回答

4

正如你應該知道,現在有兩種不同的格式存在MS Office文檔:一個是舊的格式(例如「.doc」或「.xls」),另一種是新版本(例如「.docx」或「.xlsx」)使用的基於XML的格式。

Apache POI中有不同的部分處理不同的格式。用於處理舊MS Office格式文件的關鍵類名稱通常以「H」開頭,用於處理基於XML格式文件的類的名稱以「X」開頭。

所以,在你的例子來處理新的格式,你應該使用HWPFDocument的XWPFDocument:

XWPFDocument doc = new XWPFDocument(is); 
+0

感謝您對二者的詳細比較。我終於明白他們的分歧。 – ljpv14

+0

我很高興它有幫助。 –

+0

有沒有在Apache POI中將HWPF轉換爲XWPF的方法? –

相關問題