2010-04-25 76 views
3

我有一個包含大約100000篇文章的文本文件。 文件的結構是:如何在C#中打開一個大文本文件

 
.Document ID 42944-YEAR:5 
.Date 03\08\11 
.Cat political 
Article Content 1 

.Document ID 42945-YEAR:5 
.Date 03\08\11 
.Cat political 
Article Content 2 

我想開在C#這個文件通過行處理它行。 我試過這段代碼:

String[] FileLines = File.ReadAllText(
        TB_SourceFile.Text).Split(Environment.NewLine.ToCharArray()); 

但它說:

型 '的System.OutOfMemoryException' 的異常被拋出 。

問題是如何打開此文件並逐行讀取它。

  • 文件大小:564 MB(591886626個字節)
  • 文件編碼:UTF-8
  • 文件包含Unicode字符。

回答

8

您可以打開文件和read it as a stream,而不是一次將所有內容加載到內存中。

從MSDN:

using System; 
using System.IO; 

class Test 
{ 
    public static void Main() 
    { 
     try 
     { 
      // Create an instance of StreamReader to read from a file. 
      // The using statement also closes the StreamReader. 
      using (StreamReader sr = new StreamReader("TestFile.txt")) 
      { 
       String line; 
       // Read and display lines from the file until the end of 
       // the file is reached. 
       while ((line = sr.ReadLine()) != null) 
       { 
        Console.WriteLine(line); 
       } 
      } 
     } 
     catch (Exception e) 
     { 
      // Let the user know what went wrong. 
      Console.WriteLine("The file could not be read:"); 
      Console.WriteLine(e.Message); 
     } 
    } 
} 
10

你的文件過大,要一次讀入內存,如File.ReadAllText正在嘗試做的。您應該逐行讀取文件。

MSDN改編:

string line; 
// Read the file and display it line by line. 
using (StreamReader file = new StreamReader(@"c:\yourfile.txt")) 
{ 
    while ((line = file.ReadLine()) != null) 
    {  
     Console.WriteLine(line); 
     // do your processing on each line here 
    } 
} 

以這種方式,不超過該文件的一行更是在存儲器中在任何一個時間。

2

事情是這樣的:

using (var fileStream = File.OpenText(@"path to file")) 
{ 
    do 
    { 
     var fileLine = fileStream.ReadLine(); 
     // process fileLine here 

    } while (!fileStream.EndOfStream); 
}