堆大小問題 - 用java

內存管理我在我的應用程序，它做了兩件事下面的代碼：堆大小問題 - 用java

解析其中有數據的「N」數字文件。

對於文件中的每個數據，都會有兩個Web服務調用。

public static List<String> parseFile(String fileName) { 
    List<String> idList = new ArrayList<String>(); 
    try { 
    BufferedReader cfgFile = new BufferedReader(new FileReader(new File(fileName))); 
    String line = null; 
    cfgFile.readLine(); 
    while ((line = cfgFile.readLine()) != null) { 
     if (!line.trim().equals("")) { 
     String [] fields = line.split("\\|"); 
     idList.add(fields[0]); 
     } 
    } 
    cfgFile.close(); 
    } catch (IOException e) { 
    System.out.println(e+" Unexpected File IO Error."); 
    } 
return idList; 
}

當我嘗試解析具有100萬行記錄的文件時，java進程在處理了一定數量的數據後失敗。我得到了java.lang.OutOfMemoryError: Java heap space錯誤。由於提供了這些龐大的數據，我可以部分發現java進程會停止。請告訴我如何處理這些龐大的數據。

編輯：這部分代碼new BufferedReader(new FileReader(new File(fileName)));將解析整個文件，並受到文件大小的影響。

來源

2012-09-28 Arun

你的問題是你正在積累列表中的所有數據。解決這個問題的最佳方式是以流媒體的方式進行。這意味着不會累積列表中的所有ID，而是在每行上調用Web服務或累積較小的緩衝區，然後執行呼叫。

打開文件並創建BufferedReader將不會影響內存消耗，因爲文件中的字節將逐行讀取（或多或少）。問題出現在代碼idList.add(fields[0]);的這一點上，隨着您將所有文件數據累積到文件中，列表將隨文件一樣增大。

您的代碼應該做這樣的事情：

while ((line = cfgFile.readLine()) != null) { 
    if (!line.trim().equals("")) { 
    String [] fields = line.split("\\|"); 
    callToRemoteWebService(fields[0]); 
    } 
}

來源

2012-09-28 14:26:47 Elmer

當你想與大數據工作，你有2種選擇：

使用一個足夠大的堆，以適應所有的數據。這將會「工作」一段時間，但如果你的數據大小是無限的，它最終會失敗。
遞增地處理數據。只有部分數據（有限大小）在任何時候都保存在內存中。這是理想的解決方案，因爲它可以擴展到任何數量的數據。

來源

2012-09-28 14:26:32 jtahlborn

恕我直言，我看到第一個選項不是解決方案，因爲我不能增加我的磁盤大小。目前我正在做第二個選項提到。感謝您的回覆。:) – Arun

@阿倫 - 是的，我試圖說清楚，選項1並不是真正的解決方案。那麼說，我不確定什麼磁盤大小與任何事情有關...？ – jtahlborn

對不起，我的意思是我的prod盒上的java堆的大小，對此我沒有權限... – Arun

使用-Xms和-Xmx選項增加您的Java堆內存的大小。如果沒有明確設置，jvm將堆大小設置爲符合人體工程學的默認值，這在您的情況下是不夠的。閱讀本文，以瞭解更多關於在jvm中調優內存的信息：http://www.oracle.com/technetwork/java/javase/tech/memorymanagement-whitepaper-1-150020.pdf

編輯：在採用並行處理的生產者 - 消費者方式中執行此操作的替代方法。總的想法是創建一個生產者線程，該線程讀取文件並排隊處理任務以及消耗它們的n個消費者線程。一個很普通的想法（用於說明目的）如下：

// blocking queue holding the tasks to be executed 
final SynchronousQueue<Callable<String[]> queue = // ... 

// reads the file and submit tasks for processing 
final Runnable producer = new Runnable() { 
    public void run() { 
    BufferedReader in = null; 
    try { 
     in = new BufferedReader(new FileReader(new File(fileName))); 
     String line = null; 
     while ((line = file.readLine()) != null) { 
      if (!line.trim().equals("")) { 
       String[] fields = line.split("\\|"); 
       // this will block if there are not available consumer threads to process it... 
       queue.put(new Callable<Void>() { 
        public Void call() { 
         process(fields); 
        } 
        }); 
       } 
      } 
    } catch (InterruptedException e) { 
     Thread.currentThread().interrupt()); 
    } finally { 
     // close the buffered reader here... 
    } 
    } 
} 

// Consumes the tasks submitted from the producer. Consumers can be pooled 
// for parallel processing. 
final Runnable consumer = new Runnable() { 
    public void run() { 
    try { 
     while (true) { 
      // this method blocks if there are no items left for processing in the queue... 
      Callable<Void> task = queue.take(); 
      taks.call(); 
     } 
    } catch (InterruptedException e) { 
     Thread.currentThread().interrupt(); 
    } 
    } 
}

當然，你必須編寫管理消費者和生產者線程的生命週期代碼。正確的做法是通過使用Executor來實現。

來源

2012-09-28 14:36:14

感謝您分享您的想法。但我不滿意我的代碼的方式。我可以從代碼級別處理內存管理的東西嗎？也FYI我有權改變我的生產箱設置的堆內存大小。 – Arun

確實，您的代碼可以使用一些重構來提高內存利用率。這個想法不是將所有的項目都保存在一個列表中（因此利用更多的內存），您可以考慮使用執行器框架異步處理每個項目。我會更新我的答案如何做到這一點。 –

有沒有找到堆大小的方法...我想在更改我的代碼後監視堆大小。 – Arun

堆大小問題 - 用java

回答

相關問題