Java：讀取HUGE文件的最後n行

我想讀取非常大的文件的最後n行，而無需使用Java將整個文件讀入任何緩衝區/內存區域。Java：讀取HUGE文件的最後n行

我環顧了JDK API和Apache Commons I/O，無法找到適合此目的的應用程序。

我想到的是在UNIX中使用tail還是less的方式。我不認爲他們加載整個文件，然後顯示文件的最後幾行。在Java中也應該有類似的方法來做同樣的事情。

2010-11-08 Gaurav Verma

參見：[Java的：快速讀取文本文件的最後一行？]（http://stackoverflow.com/questions/686231） – hippietrail 2012-11-05 18:22:31

如果您使用RandomAccessFile，則可以使用length和seek到達文件末尾附近的特定點，然後從那裏讀取。

如果您發現沒有足夠的線條，請從此處備份並重試。一旦你知道了最後一行的開始，你可以在那裏尋找並且只是閱讀和打印。

根據您的數據屬性可以進行最初的最佳猜測假設。例如，如果是文本文件，則行長度可能不會超過平均值132，因此，要獲取最後五行，請在結束前開始660個字符。然後，如果您錯了，請在1320處再次嘗試（您甚至可以使用從最近660個字符中學到的內容來調整它） - 例如：如果這660個字符只是三行，則下一次嘗試可能是660/3 * 5，加上也許多一點，以防萬一）。

來源

2010-11-08 06:23:47 paxdiablo

A RandomAccessFile允許尋求（http://download.oracle.com/javase/1.4.2/docs/api/java/io/RandomAccessFile.html）。 File.length方法將返回文件的大小。問題是確定行數。爲此，您可以搜索文件的末尾並向後讀取，直到您點擊了正確的行數。

來源

2010-11-08 06:24:20

RandomAccessFile是一個開始的好地方，正如其他答案所述。雖然有一個重要警告。

如果您的文件沒有使用每字節一個字節的編碼進行編碼，則readLine()方法不適用於您。在任何情況下，readUTF()都不起作用。（它讀取一個字符串前面加上一個字符數......）

相反，您需要確保以符合編碼字符邊界的方式查找換行符標記。對於固定長度的編碼（例如UTF-16或UTF-32），您需要從字節位置開始提取字符，字符位置可以按字節大小來整除。對於可變長度編碼（例如UTF-8），您需要搜索一個字節，必須是是字符的第一個字節。

對於UTF-8，字符的第一個字節爲0xxxxxxx或110xxxxx或1110xxxx或11110xxx。其他任何內容都是第二/第三個字節或非法的UTF-8序列。請參閱The Unicode Standard, Version 5.2, Chapter 3.9，表3-7。正如評論討論指出的那樣，這意味着正確編碼的UTF-8流中的任何0x0A和0x0D字節都將表示LF或CR字符。因此，計算字節是一個有效的實現策略（對於UTF-8）。

確定了合適的字符邊界後，您可以直接調用new String(...)傳遞字節數組，偏移量，計數和編碼，然後重複調用String.lastIndexOf(...)來計算行結束數。

來源

2010-11-08 06:44:11

+1提的警告。我認爲對於UTF-8，通過掃描'\ n'可以使問題變得更簡單......至少這是Jon Skeet在他對[相關問題]的回答中暗示的內容（http://stackoverflow.com//686231/quick-read-the-line-of-a-text-file）...似乎'\ n'只能在UTF-8中作爲有效字符出現，而不能在'額外字節'中出現。 .. – 2014-08-07 21:53:14

是的，對於UTF-8，它很簡單。 UTF-8將字符作爲單個字節（所有ASCII字符）或多個字節（所有其他Unicode字符）進行編碼。對我們來說幸運的是，換行符是ASCII字符，在UTF-8中，沒有多字節字符包含也是有效ASCII字符的字節。也就是說，如果掃描ASCII換行符的字節數組並找到它，您就會知道它是換行符，而不是其他多字節字符的一部分。我寫了一個[博客文章]（http://stijndewitt.wordpress.com/2014/08/09/max-bytes-in-a-utf-8-char/），其中有一個很好的表格來說明這一點。 – 2014-08-10 12:29:36

問題是1）字節編碼字節'0x0a'不是換行符（例如UTF-16），2）有其他Unicode行分隔符代碼點的事實;例如'0x2028'，'0x2029'和'0x0085' – 2014-08-10 12:46:02

這是我發現的最好的方法。簡單而快速且高效的內存。

public static void tail(File src, OutputStream out, int maxLines) throws FileNotFoundException, IOException { 
    BufferedReader reader = new BufferedReader(new FileReader(src)); 
    String[] lines = new String[maxLines]; 
    int lastNdx = 0; 
    for (String line=reader.readLine(); line != null; line=reader.readLine()) { 
     if (lastNdx == lines.length) { 
      lastNdx = 0; 
     } 
     lines[lastNdx++] = line; 
    } 

    OutputStreamWriter writer = new OutputStreamWriter(out); 
    for (int ndx=lastNdx; ndx != lastNdx-1; ndx++) { 
     if (ndx == lines.length) { 
      ndx = 0; 
     } 
     writer.write(lines[ndx]); 
     writer.write("\n"); 
    } 

    writer.flush(); 
}

來源

2011-08-30 21:44:44 ra9r

既然這將讀取整個文件，對於較大的文件，這不會很好地擴展。 – ChristopheD 2013-04-01 21:47:04

另外，這個函數進入空文件的無限循環。 – shak 2013-12-30 09:11:22

爲什麼它會循環一個空文件？ – 2016-11-26 01:04:20

CircularFifoBuffer來自apache commons。在How to read last 5 lines of a .txt file into java

注意從一個類似的問題回答了Apache Commons Collections中4該類似乎已更名爲CircularFifoQueue

來源

2013-02-19 04:57:31 ruth542

我檢查了你提到的類，雖然它確實可以用來跟蹤文件中的最後5行，但我認爲這裏的挑戰不是跟蹤行，而是要找到文件中的點在哪裏開始閱讀，以及如何達到這一點。 – 2014-08-07 19:30:22

我發現RandomAccessFile和其他緩衝區讀取器類太慢了我。沒有什麼比tail -<#lines>更快。所以這對我來說是最好的解決方案。

public String getLastNLogLines(File file, int nLines) { 
    StringBuilder s = new StringBuilder(); 
    try { 
     Process p = Runtime.getRuntime().exec("tail -"+nLines+" "+file); 
     java.io.BufferedReader input = new java.io.BufferedReader(new java.io.InputStreamReader(p.getInputStream())); 
     String line = null; 
    //Here we first read the next line into the variable 
    //line and then check for the EOF condition, which 
    //is the return value of null 
    while((line = input.readLine()) != null){ 
      s.append(line+'\n'); 
     } 
    } catch (java.io.IOException e) { 
     e.printStackTrace(); 
    } 
    return s.toString(); 
}

來源

2013-09-18 13:25:54 Luca

根據你擁有多少內存，執行到「tail」本身可能是一個非常昂貴的命題。它也是Unix特有的。 – Gray 2013-11-04 20:12:36

我發現它使用ReversedLinesFileReader從apache commons-io API做的最簡單方法。這種方法會給你從文件的底部到頂部的行，你可以指定n_lines值來指定行數。

import org.apache.commons.io.input.ReversedLinesFileReader; 


File file = new File("D:\\file_name.xml"); 
int n_lines = 10; 
int counter = 0; 
ReversedLinesFileReader object = new ReversedLinesFileReader(file); 
while(!object.readLine().isEmpty() && counter < n_lines) 
{ 
    System.out.println(object.readLine()); 
    counter++; 
}

來源

2014-09-02 10:18:18

注意：每次調用readLine（）時，光標都會前進。所以這段代碼實際上會錯過每一行，因爲while語句中'readLine（）'的輸出沒有被捕獲。 – aapierce 2015-12-23 22:42:47

我只是想知道這種方法是否有效？ – Forrest 2017-04-10 07:49:19

此代碼有點故障，因爲readLine（）被調用兩次。如aapierce所述。但是指向ReversedLinesFileReader – vinksharma 2017-05-23 21:11:10

-1

int n_lines = 1000; 
    ReversedLinesFileReader object = new ReversedLinesFileReader(new File(path)); 
    String result=""; 
    for(int i=0;i<n_lines;i++){ 
     String line=object.readLine(); 
     if(line==null) 
      break; 
     result+=line; 
    } 
    return result;

來源

2017-10-05 08:06:55

我有類似的問題，但我不理解爲另一種解決方案。

我用過這個。我希望那簡單的代碼。

// String filePathName = (direction and file name). 
File f = new File(filePathName); 
long fileLength = f.length(); // Take size of file [bites]. 
long fileLength_toRead = 0; 
if (fileLength > 2000) { 
    // My file content is a table, I know one row has about e.g. 100 bites/characters. 
    // I used 1000 bites before file end to point where start read. 
    // If you don't know line length, use @paxdiablo advice. 
    fileLength_toRead = fileLength - 1000; 
} 
try (RandomAccessFile raf = new RandomAccessFile(filePathName, "r")) { // This row manage open and close file. 
    raf.seek(fileLength_toRead); // File will begin read at this bite. 
    String rowInFile = raf.readLine(); // First readed line usualy is not whole, I needn't it. 
    rowInFile = raf.readLine(); 
    while (rowInFile != null) { 
     // Here I can readed lines (rowInFile) add to String[] array or ArriyList<String>. 
     // Later I can work with rows from array - last row is sometimes empty, etc. 
     rowInFile = raf.readLine(); 
    } 
} 
catch (IOException e) { 
    // 
}

來源

2017-12-13 11:51:42 pocket