2017-02-22 98 views
1

我正在嘗試讀取Big XLSX文件。 Excel文件有大約500K rows.I需要閱讀山坳2.在java中讀取巨大的Excel文件(500K行)

OPCPackage pkg; 
pkg = OPCPackage.open("File path"); 
XSSFWorkbook myWorkBook = new XSSFWorkbook(pkg); 
Sheet sheet = myWorkBook.getSheetAt(2); 
Iterator<Row> rowIterator = sheet.iterator(); 
while (rowIterator.hasNext()) 
{ 
Row row = rowIterator.next(); 
if (row_num > ROW_ESCAPE) 
{ 
    Cell cell = row.getCell(2); 
    if (!cell.getStringCellValue().toString().trim().isEmpty()) 
      { 
       System.out.println(cell.getStringCellValue().toString()); 
      } 
System.out.println("hi"+row_num); 
     } 
     row_num++; 
} 

它打印,直到行39723 它拋出以下異常

Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space 
at java.util.regex.Matcher.<init>(Matcher.java:225) 
at java.util.regex.Pattern.matcher(Pattern.java:1093) 
at org.apache.poi.xssf.usermodel.XSSFRichTextString.utfDecode(XSSFRichTextString.java:482) 
at org.apache.poi.xssf.usermodel.XSSFRichTextString.getString(XSSFRichTextString.java:297) 
at org.apache.poi.xssf.usermodel.XSSFCell.getStringCellValue(XSSFCell.java:262) 
at Main.get_titles(Main.java:484) 
at Main.analyze_Importsheet(Main.java:461) 
at Main.but_sel_imp_sheetActionPerformed(Main.java:220) 
at Main.access$000(Main.java:40) 
at Main$1.actionPerformed(Main.java:85) 
at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022) 
at javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348) 
at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402) 
at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259) 
at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(BasicButtonListener.java:252) 
at java.awt.Component.processMouseEvent(Component.java:6533) 
at javax.swing.JComponent.processMouseEvent(JComponent.java:3324) 
at java.awt.Component.processEvent(Component.java:6298) 
at java.awt.Container.processEvent(Container.java:2236) 
at java.awt.Component.dispatchEventImpl(Component.java:4889) 
at java.awt.Container.dispatchEventImpl(Container.java:2294) 
at java.awt.Component.dispatchEvent(Component.java:4711) 
at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888) 
at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525) 
at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466) 
at java.awt.Container.dispatchEventImpl(Container.java:2280) 
at java.awt.Window.dispatchEventImpl(Window.java:2746) 
at java.awt.Component.dispatchEvent(Component.java:4711) 
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758) 
at java.awt.EventQueue.access$500(EventQueue.java:97) 
at java.awt.EventQueue$3.run(EventQueue.java:709) 
at java.awt.EventQueue$3.run(EventQueue.java:703) 

Main.java:484=if後(!cell.getStringCellValue()。toString()。trim()。isEmpty()) 如果我刪除該行並只打印行號,它可以正常工作。 我需要幫助如何獲得col 2的字符串值。

回答

0

增加JVM的堆大小可能會修復您的OutOfMemoryError。有關如何增加JVM的堆大小,請參閱this stackoverflow post

+0

我不得不提。我已經使用java -Xmx1G -jar Importsheet_Breaker.jar –

0

最簡單的方法(不改變你的閱讀邏輯)就是增加堆的大小。

如果這不適合您,請使用流。其實,有一個方便的圖書館已經可用。

https://github.com/monitorjbl/excel-streaming-reader

+0

我的Excel工作表有一些隱藏工作表。隨着流我不能讀這些表。 XSSFWorkbook oldWorkbook; OPCPackage pkg; pkg = OPCPackage.open(myImport.get_path()); oldWorkbook =(XSSFWorkbook)WorkbookFactory.create(pkg); 昨天bobe代碼正在工作,但令人驚訝的是,今天停止工作,並拋出一個heapsize錯誤。 –