我正在研究從2個大型csv文件(逐行讀取數據)中讀取數據的「程序」,比較文件中的數組元素,並在找到匹配項時寫入我的必要的數據放入第三個文件。我遇到的唯一問題是它非常緩慢。它讀取每秒1-2行,這是非常緩慢的,考慮到我有數百萬條記錄。關於如何讓它更快的任何想法?這裏是我的代碼:優化CSV解析速度更快
public class ReadWriteCsv {
public static void main(String[] args) throws IOException {
FileInputStream inputStream = null;
FileInputStream inputStream2 = null;
Scanner sc = null;
Scanner sc2 = null;
String csvSeparator = ",";
String line;
String line2;
String path = "D:/test1.csv";
String path2 = "D:/test2.csv";
String path3 = "D:/newResults.csv";
String[] columns;
String[] columns2;
Boolean matchFound = false;
int count = 0;
StringBuilder builder = new StringBuilder();
FileWriter writer = new FileWriter(path3);
try {
// specifies where to take the files from
inputStream = new FileInputStream(path);
inputStream2 = new FileInputStream(path2);
// creating scanners for files
sc = new Scanner(inputStream, "UTF-8");
// while there is another line available do:
while (sc.hasNextLine()) {
count++;
// storing the current line in the temporary variable "line"
line = sc.nextLine();
System.out.println("Number of lines read so far: " + count);
// defines the columns[] as the line being split by ","
columns = line.split(",");
inputStream2 = new FileInputStream(path2);
sc2 = new Scanner(inputStream2, "UTF-8");
// checks if there is a line available in File2 and goes in the
// while loop, reading file2
while (!matchFound && sc2.hasNextLine()) {
line2 = sc2.nextLine();
columns2 = line2.split(",");
if (columns[3].equals(columns2[1])) {
matchFound = true;
builder.append(columns[3]).append(csvSeparator);
builder.append(columns[1]).append(csvSeparator);
builder.append(columns2[2]).append(csvSeparator);
builder.append(columns2[3]).append("\n");
String result = builder.toString();
writer.write(result);
}
}
builder.setLength(0);
sc2.close();
matchFound = false;
}
if (sc.ioException() != null) {
throw sc.ioException();
}
} finally {
//then I close my inputStreams, scanners and writer
看起來你正在重讀第一行中每行的整個第二個文件。 *當然*這對大文件來說會很慢。 – azurefrog
你能適應這兩個文件的內存?如果是這樣,只需讀取並加載到內存中的數據結構(數組,列表等)。與內存操作相比,IO操作非常昂貴。 – Yuri
@azurefrog我怎麼能這樣做呢?新編程,對不起 - – Noobinator