我正在創建一個分析文件數據質量的工具。所以我需要閱讀文件的每一行並分析其中的每一行。我還需要在內存中存儲我的文件的所有行,因爲用戶將能夠深入到特定的部分。所以基本上所有的工作都適用於包含數千行的文件。但是,當嘗試使用包含超過4百萬行的CSV文件時,我會遇到內存不足異常。我認爲C#能夠處理其內存緩存中的數百萬數據,但看起來並不像它。所以我有點卡住,不知道該怎麼做。也許我的一段代碼不是最高性能的,所以如果你能告訴我一種改進它的方法,那將會很棒嗎?只是要記住,我需要在內存中的文件的所有行,因爲根據用戶的行動,我需要訪問特定的行來顯示給用戶。當讀大文件時內存不足
下面是讀取每一行
using (FileStream fs = File.Open(this.dlgInput.FileName.ToString(), FileMode.Open, FileAccess.Read, FileShare.Read))
using (BufferedStream bs = new BufferedStream(fs))
using (System.IO.StreamReader sr = new StreamReader(this.dlgInput.FileName.ToString(), Encoding.Default, false, 8192))
{
string line;
if (this.chkSkipHeader.Checked)
{
sr.ReadLine();
}
progressBar1.Visible = true;
int nbOfLines = File.ReadLines(this.dlgInput.FileName.ToString()).Count();
progressBar1.Maximum = nbOfLines;
this.lines = new string[nbOfLines][];
this.patternedLines = new string[nbOfLines][];
for (int i = 0; i < nbOfLines; i++)
{
this.lines[i] = new string[this.dgvFields.Rows.Count];
this.patternedLines[i] = new string[this.dgvFields.Rows.Count];
}
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
this.recordCount += 1;
char[] c = new char[1] { ',' };
System.Text.RegularExpressions.Regex CSVParser = new System.Text.RegularExpressions.Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");
String[] fields = CSVParser.Split(line);
ParseLine(fields);
this.lines[recordCount - 1] = fields;
progressBar1.PerformStep();
}
}
並且在下面的ParseLine功能也通過陣列一些分析需要保持在存儲器中的呼叫:
private void ParseLine(String[] fields2)
{
for (int j = 0; j <= fields2.Length - 1; j++)
{
if ((int)this.dgvFields.Rows[j].Cells["colSelected"].Value == 1)
{
/*' ************************************************
' Save Number of Counts by Value
' ************************************************/
if (this.values[j].ContainsKey(fields2[j]))
{
//values[0] = Dictionary<"TEST", 1> (fields2[0 which is source code] = count])
this.values[j][fields2[j]] += 1;
}
else
{
this.values[j].Add(fields2[j], 1);
}
/* ' ************************************************
' Save Pattern Values/Counts
' ************************************************/
string tmp = System.Text.RegularExpressions.Regex.Replace(fields2[j], "\\p{Lu}", "X");
tmp = System.Text.RegularExpressions.Regex.Replace(tmp, "\\p{Ll}", "x");
tmp = System.Text.RegularExpressions.Regex.Replace(tmp, "[0-9]", "0");
if (this.patterns[j].ContainsKey(tmp))
{
this.patterns[j][tmp] += 1;
}
else
{
this.patterns[j].Add(tmp, 1);
}
this.patternedLines[this.recordCount - 1][j] = tmp;
/* ' ************************************************
' Count Blanks/Alpha/Numeric/Phone/Other
' ************************************************/
if (String.IsNullOrWhiteSpace(fields2[j]))
{
this.blanks[j] += 1;
}
else if (System.Text.RegularExpressions.Regex.IsMatch(fields2[j], "^[0-9]+$"))
{
this.numeric[j] += 1;
}
else if (System.Text.RegularExpressions.Regex.IsMatch(fields2[j].ToUpper().Replace("EXTENSION", "").Replace("EXT", "").Replace("X", ""), "^[0-9()\\- ]+$"))
{
this.phone[j] += 1;
}
else if (System.Text.RegularExpressions.Regex.IsMatch(fields2[j], "^[a-zA-Z ]+$"))
{
this.alpha[j] += 1;
}
else
{
this.other[j] += 1;
}
if (this.recordCount == 1)
{
this.high[j] = fields2[j];
this.low[j] = fields2[j];
}
else
{
if (fields2[j].CompareTo(this.high[j]) > 0)
{
this.high[j] = fields2[j];
}
if (fields2[j].CompareTo(this.low[j]) < 0)
{
this.low[j] = fields2[j];
}
}
}
}
}
更新:新的代碼
int nbOfLines = File.ReadLines(this.dlgInput.FileName.ToString()).Count();
//Read file
using (System.IO.StreamReader sr = new StreamReader(this.dlgInput.FileName.ToString(), Encoding.Default, false, 8192))
{
string line;
if (this.chkSkipHeader.Checked)
{ sr.ReadLine(); }
progressBar1.Visible = true;
progressBar1.Maximum = nbOfLines;
this.lines = new string[nbOfLines][];
this.patternedLines = new string[nbOfLines][];
for (int i = 0; i < nbOfLines; i++)
{
this.lines[i] = new string[this.dgvFields.Rows.Count];
this.patternedLines[i] = new string[this.dgvFields.Rows.Count];
}
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
this.recordCount += 1;
char[] c = new char[1] { ',' };
System.Text.RegularExpressions.Regex CSVParser = new System.Text.RegularExpressions.Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");
String[] fields = CSVParser.Split(line);
ParseLine(fields);
this.lines[recordCount - 1] = fields;
progressBar1.PerformStep();
}
}
請正確格式化您的代碼 – byxor
c#無法從無到有創建內存。如果你的數據比適合你的系統內存和/或虛擬內存的數據多,那麼你就會陷入困境。要麼改變代碼的工作方式以減少內存負載,要麼獲得更多的內存。 –
我有4個內核和16GB的內存 - 對於4百萬行文件來說不夠嗎? –