2
我有一個應用程序逐行讀取一個5GB的文本文件,並將逗號分隔的雙引號字符串轉換爲管道分隔格式。 即「Smith,John」,「Snow,John」 - > Smith,John | Snow,JohnVB讀取/寫入5 GB的文本文件
我在下面提供了我的代碼。我的問題是:是否有更高效的處理大文件的方式?
Dim fName As String = "C:\LargeFile.csv"
Dim wrtFile As String = "C:\ProcessedFile.txt"
Dim strRead As New System.IO.StreamReader(fName)
Dim strWrite As New System.IO.StreamWriter(wrtFile)
Dim line As String = ""
Do While strRead.Peek <> -1
line = strRead.ReadLine
Dim pattern As String = "(,)(?=(?:[^""]|""[^""]*"")*$)"
Dim replacement As String = "|"
Dim regEx As New Regex(pattern)
Dim newLine As String = regEx.Replace(line, replacement)
newLine = newLine.Replace(Chr(34), "")
strWrite.WriteLine(newLine)
Loop
strWrite.Close()
更新的代碼
Dim fName As String = "C:\LargeFile.csv"
Dim wrtFile As String = "C:\ProcessedFile.txt"
Dim strRead As New System.IO.StreamReader(fName)
Dim strWrite As New System.IO.StreamWriter(wrtFile)
Dim line As String = ""
Do While strRead.Peek <> -1
line = strRead.ReadLine
line = line.Replace(Chr(34) + Chr(44) + Chr(34), "|")
line = line.Replace(Chr(34), "")
strWrite.WriteLine(line)
Loop
strWrite.Close()
有你看變成多線程? – Werdna
您可以使用StringBuilder作爲緩衝區來一次保存幾百行輸出。或者查看使用[BufferedStream](https://msdn.microsoft.com/en-us/library/system.io.bufferedstream%28v=vs.110%29.aspx)。如果這是一項常規任務,您甚至可以嘗試使用物理上獨立的磁盤驅動器作爲輸入和輸出。另外,雖然我期望編譯器將正則表達式創建移動到循環外部,但您也可以這樣做。 –
除了@AndrewMorton的建議之外,爲了將正則表達式創建移到循環外部,您還可以預編譯它,而不是使用解釋模式。即'Dim regEx As New Regex(pattern,RegexOptions.Compiled)'。請參閱:[正則表達式性能](https://blogs.msdn.microsoft.com/bclteam/2004/11/12/regular-expression-performance-david-gutierrez/) – TnTinMn