高效讀取雙精度矩陣的方法

什麼是快速讀取所有雙精度矩陣的快速方法（在這個矩陣中沒有NAs缺失的元素）。大多數參賽作品是非零雙打，也許有30％是零。尺寸大約100萬行100列。高效讀取雙精度矩陣的方法

我正在使用的功能如下。但是，對於1千兆字節以上的矩陣來說，它非常慢。

我該如何更快地做到這一點？以下任何幫助： - 不要保存爲csv並閱讀它，請嘗試保存爲二進制格式或其他格式。 - 將矩陣轉置到數據文件中，然後逐列讀取，而不是逐行讀取，如下面的函數所做的那樣。 - 以某種方式將矩陣序列化爲Java對象以便重新讀取。

private static Vector<Vector<Double>> readTXTFile(String csvFileName, int skipRows) throws IOException { 
    String line = null; 
    BufferedReader stream = null; 
    Vector<Vector<Double>> csvData = new Vector<Vector<Double>>(); 

    try { 
     stream = new BufferedReader(new FileReader(csvFileName)); 
     int count = 0; 
     while ((line = stream.readLine()) != null) { 
      count += 1; 
      if(count <= skipRows) { 
       continue; 
      } 
      String[] splitted = line.split(","); 
      Vector<Double> dataLine = new Vector<Double>(splitted.length); 
      for (String data : splitted) { 
       dataLine.add(Double.valueOf(data)); 
      } 

      csvData.add(dataLine); 
     } 
    } finally { 
     if (stream != null) 
      stream.close(); 
    } 

    return csvData; 
}

來源

2014-01-14 user2763361

我改變你的代碼，以擺脫所有的創作載體和雙對象的贊成使用一個固定大小的矩陣（它假設你知道或者可以計算的行和列數文件提前）。

我向它投擲了500,000行文件，並看到大約25％的改善。

private static double[][] readTXTFile(String csvFileName, int skipRows) throws IOException { 
    BufferedReader stream = null; 
    int totalRows = 500000, totalColumns = 6; 
    double[][] matrix = new double[totalRows][totalColumns]; 

    try { 
     stream = new BufferedReader(new FileReader(csvFileName)); 
     for (int currentRow = 0; currentRow < totalRows; currentRow++) { 
      String line = stream.readLine(); 
      if (currentRow <= skipRows) { 
       continue; 
      } 
      String[] splitted = line.split(","); 
      for (int currentColumn = 0; currentColumn < totalColumns; currentColumn++) { 
       matrix[currentRow][currentColumn] = Double.parseDouble(splitted[currentColumn]); 
      } 
     } 
    } finally { 
     if (stream != null) { 
      stream.close(); 
     } 
    } 
    return matrix; 
}

來源

2014-01-14 04:47:15

高效讀取雙精度矩陣的方法

回答

相關問題