2014-05-13 102 views
2

首先:抱歉我的英文不好,我是德語。閱讀大文本文件,直到某個字符串

我有這樣大的字符串分隔文本文件(未單字符分隔):

第一數據[STRING-分隔符]第二數據[STRING-分隔符] ...

我不想將整個文件加載到內存中,因爲它的大小(〜250MB)。如果我用System.IO.File.ReadAllText讀取整個文件,我會得到一個OutOfMemoryException

因此我想直到第一次讀取[STRING-SEPERATOR]時讀取文件,然後繼續下一個字符串。這就像從文件中「取走」first data,對其進行處理,然後繼續使用second data,該文件現在是文件的第一個數據。

System.IO.StreamReader.ReadLine()不幫我,因爲該文件的內容是一行。

你知道如何讀取文件,直到.NET中的某個字符串?

我希望有一些想法,謝謝。

+0

'[STRING-SEPERATOR]'是單個字符還是一串字符? – Enigmativity

+0

這是一串字符。 – user2190035

+0

'[STRING-SEPERATOR]'多久可以連續多少次[STRING-SEPERATOR]? – Enigmativity

回答

0

謝謝你的回覆。這裏是我在VB.NET中寫的函數:

Public Function ReadUntil(Stream As System.IO.FileStream, UntilText As String) As String 
      Dim builder As New System.Text.StringBuilder() 
      Dim returnTextBuilder As New System.Text.StringBuilder() 
      Dim returnText As String = String.Empty 
      Dim size As Integer = CInt(UntilText.Length/2) - 1 
      Dim buffer(size) As Byte 
      Dim currentRead As Integer = -1 

      Do Until currentRead = 0 
       Dim collected As String = Nothing 
       Dim chars As String = Nothing 
       Dim foundIndex As Integer = -1 

       currentRead = Stream.Read(buffer, 0, buffer.Length) 
       chars = System.Text.Encoding.Default.GetString(buffer, 0, currentRead) 

       builder.Append(chars) 
       returnTextBuilder.Append(chars) 

       collected = builder.ToString() 
       foundIndex = collected.IndexOf(UntilText) 

       If (foundIndex >= 0) Then 
        returnText = returnTextBuilder.ToString() 

        Dim indexOfSep As Integer = returnText.IndexOf(UntilText) 
        Dim cutLength As Integer = returnText.Length - indexOfSep 

        returnText = returnText.Remove(indexOfSep, cutLength) 

        builder.Remove(0, foundIndex + UntilText.Length) 

        If (cutLength > UntilText.Length) Then 
         Stream.Position = Stream.Position - (cutLength - UntilText.Length) 
        End If 

        Return returnText 
       ElseIf (Not collected.Contains(UntilText.First())) Then 
        builder.Length = 0 
       End If 
      Loop 

      Return String.Empty 
    End Function 
0

this questions中所述,還可以按字符方式讀取文本文件。爲了搜索某個字符串,你必須使用一些手動實現的邏輯,它可以基於字符方式的輸入搜索所需的字符串,這可以通過狀態機來完成。

0

StreamReader.Read有一些重載,可能會幫助你。 試試這個:

int index, count; 
index = 0; 
count = 200; // or whatever number you think is better 
char[] buffer = new char[count]; 
System.IO.StreamReader sr = new System.IO.StreamReader("Path here"); 
while (sr.Read(buffer, index, count) > 0) { 
    /* 
    check if buffer contains your string seperator, or at least some part of it 
    if it contains a part of it, you need check the rest of the stream to make sure it's a real seporator 
    do your stuff, set the index to one character after the last seporator. 
    */ 
} 
1

這應該對你有所幫助。

private IEnumerable<string> ReadCharsByChunks(int chunkSize, string filePath) 
{ 
    using (FileStream fs = new FileStream(filePath, FileMode.Open)) 
    { 
     byte[] buffer = new byte[chunkSize]; 
     int currentRead; 
     while ((currentRead = fs.Read(buffer, 0, chunkSize)) > 0) 
     { 
      yield return Encoding.Default.GetString(buffer, 0, currentRead); 
     } 
    } 
} 

private void SearchWord(string searchWord) 
{ 
    StringBuilder builder = new StringBuilder(); 
    foreach (var chars in ReadCharsByChunks(2, "sample.txt"))//Can be any number 
    { 
     builder.Append(chars); 

     var existing = builder.ToString(); 
     int foundIndex = -1; 
     if ((foundIndex = existing.IndexOf(searchWord)) >= 0) 
     { 
      //Found 
      MessageBox.Show("Found"); 

      builder.Remove(0, foundIndex + searchWord.Length); 
     } 
     else if (!existing.Contains(searchWord.First())) 
     { 
      builder.Clear(); 
     } 
    } 
}