閱讀大文本文件，直到某個字符串

首先：抱歉我的英文不好，我是德語。閱讀大文本文件，直到某個字符串

我有這樣大的字符串分隔文本文件（未單字符分隔）：

第一數據[STRING-分隔符]第二數據[STRING-分隔符] ...

我不想將整個文件加載到內存中，因爲它的大小（〜250MB）。如果我用System.IO.File.ReadAllText讀取整個文件，我會得到一個OutOfMemoryException。

因此我想直到第一次讀取[STRING-SEPERATOR]時讀取文件，然後繼續下一個字符串。這就像從文件中「取走」first data，對其進行處理，然後繼續使用second data，該文件現在是文件的第一個數據。

System.IO.StreamReader.ReadLine()不幫我，因爲該文件的內容是一行。

你知道如何讀取文件，直到.NET中的某個字符串？

我希望有一些想法，謝謝。

來源

2014-05-13 user2190035

'[STRING-SEPERATOR]'是單個字符還是一串字符？ – Enigmativity

這是一串字符。 – user2190035

'[STRING-SEPERATOR]'多久可以連續多少次[STRING-SEPERATOR]？ – Enigmativity

謝謝你的回覆。這裏是我在VB.NET中寫的函數：

Public Function ReadUntil(Stream As System.IO.FileStream, UntilText As String) As String 
      Dim builder As New System.Text.StringBuilder() 
      Dim returnTextBuilder As New System.Text.StringBuilder() 
      Dim returnText As String = String.Empty 
      Dim size As Integer = CInt(UntilText.Length/2) - 1 
      Dim buffer(size) As Byte 
      Dim currentRead As Integer = -1 

      Do Until currentRead = 0 
       Dim collected As String = Nothing 
       Dim chars As String = Nothing 
       Dim foundIndex As Integer = -1 

       currentRead = Stream.Read(buffer, 0, buffer.Length) 
       chars = System.Text.Encoding.Default.GetString(buffer, 0, currentRead) 

       builder.Append(chars) 
       returnTextBuilder.Append(chars) 

       collected = builder.ToString() 
       foundIndex = collected.IndexOf(UntilText) 

       If (foundIndex >= 0) Then 
        returnText = returnTextBuilder.ToString() 

        Dim indexOfSep As Integer = returnText.IndexOf(UntilText) 
        Dim cutLength As Integer = returnText.Length - indexOfSep 

        returnText = returnText.Remove(indexOfSep, cutLength) 

        builder.Remove(0, foundIndex + UntilText.Length) 

        If (cutLength > UntilText.Length) Then 
         Stream.Position = Stream.Position - (cutLength - UntilText.Length) 
        End If 

        Return returnText 
       ElseIf (Not collected.Contains(UntilText.First())) Then 
        builder.Length = 0 
       End If 
      Loop 

      Return String.Empty 
    End Function

來源

2014-05-14 07:12:44 user2190035

如this questions中所述，還可以按字符方式讀取文本文件。爲了搜索某個字符串，你必須使用一些手動實現的邏輯，它可以基於字符方式的輸入搜索所需的字符串，這可以通過狀態機來完成。

來源

2014-05-13 07:26:43 Codor

StreamReader.Read有一些重載，可能會幫助你。試試這個：

int index, count; 
index = 0; 
count = 200; // or whatever number you think is better 
char[] buffer = new char[count]; 
System.IO.StreamReader sr = new System.IO.StreamReader("Path here"); 
while (sr.Read(buffer, index, count) > 0) { 
    /* 
    check if buffer contains your string seperator, or at least some part of it 
    if it contains a part of it, you need check the rest of the stream to make sure it's a real seporator 
    do your stuff, set the index to one character after the last seporator. 
    */ 
}

來源

2014-05-13 07:37:50

這應該對你有所幫助。

private IEnumerable<string> ReadCharsByChunks(int chunkSize, string filePath) 
{ 
    using (FileStream fs = new FileStream(filePath, FileMode.Open)) 
    { 
     byte[] buffer = new byte[chunkSize]; 
     int currentRead; 
     while ((currentRead = fs.Read(buffer, 0, chunkSize)) > 0) 
     { 
      yield return Encoding.Default.GetString(buffer, 0, currentRead); 
     } 
    } 
} 

private void SearchWord(string searchWord) 
{ 
    StringBuilder builder = new StringBuilder(); 
    foreach (var chars in ReadCharsByChunks(2, "sample.txt"))//Can be any number 
    { 
     builder.Append(chars); 

     var existing = builder.ToString(); 
     int foundIndex = -1; 
     if ((foundIndex = existing.IndexOf(searchWord)) >= 0) 
     { 
      //Found 
      MessageBox.Show("Found"); 

      builder.Remove(0, foundIndex + searchWord.Length); 
     } 
     else if (!existing.Contains(searchWord.First())) 
     { 
      builder.Clear(); 
     } 
    } 
}

來源

2014-05-13 08:19:12

閱讀大文本文件，直到某個字符串

回答

相關問題