我需要做的是我有幾個文件(txt)每個大約2GB。我需要削減文件讓我們說每當'%% XGF NEW_SET'標記出現我需要創建新文件並將其存儲。我認爲這個標記大概每40-50行出現一次。每行有4-20個字符。 所以我需要將大文件切割成數千個小文件,然後再處理它們。我想到了這樣的示例代碼。高效的方式來讀取和剪切文件
DirectoryInfo di = new DirectoryInfo(ConfigurationManager.AppSettings["BilixFilesDir"]);
var files = di.GetFiles();
int count = 0;
bool hasObject = false;
StringBuilder sb = new StringBuilder();
string line = "";
foreach (var file in files)
{
using (StreamReader sr = new StreamReader(file.FullName,Encoding.GetEncoding(1250)))
{
while ((line = sr.ReadLine()) != null)
{
//when new file starts
if (line.Contains("%%XGF NEW_SET"))
{
//when new file existed I need to store old one
if (hasObject)
{
File.WriteAllText(string.Format("{0}/{1}-{2}", ConfigurationManager.AppSettings["OutputFilesDir"], count++, file.Name), sb.ToString());
sb.Length = 0;
sb.Capacity = 0;
}
//setting exist flag
hasObject = true;
}
//when there is no new object
else
//when object exists adding new lines
if (hasObject)
sb.AppendLine(line);
}
//when all work done saving last object
if (hasObject)
{
File.WriteAllText(string.Format("{0}/{1}-{2}", ConfigurationManager.AppSettings["OutputFilesDir"], count++, file.Name), sb.ToString());
sb.Length = 0;
sb.Capacity = 0;
}
}
}
}
所以我的例子看起來像那樣,但我需要高效率。任何想法我可以改進我的解決方案?由於
'%% XGF NEW_SET`是分割線上唯一的東西嗎?如果沒有,你正在失去其他信息,因爲你正在扔掉這條線。 – 2011-02-11 14:47:58