我有2文本文件是如下(如1466786391
大量是唯一時間戳):合併兩個文本文件刪除重複
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 49 packets received, 2% packet loss
round-trip min/avg/max = 20.917/70.216/147.258 ms
1466786342
PING 10.0.0.6 (10.0.0.6): 56 data bytes
....
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 50 packets received, 0% packet loss
round-trip min/avg/max = 29.535/65.768/126.983 ms
1466786391
這:
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 49 packets received, 2% packet loss
round-trip min/avg/max = 20.917/70.216/147.258 ms
1466786342
PING 10.0.0.6 (10.0.0.6): 56 data bytes
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 50 packets received, 0% packet loss
round-trip min/avg/max = 29.535/65.768/126.983 ms
1466786391
PING 10.0.0.6 (10.0.0.6): 56 data byte
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 44 packets received, 12% packet loss
round-trip min/avg/max = 30.238/62.772/102.959 ms
1466786442
PING 10.0.0.6 (10.0.0.6): 56 data bytes
....
所以第一文件以timestamp
結尾,並且第二個文件在中間的某個位置具有相同的數據塊,之後具有更多的數據,具體時間戳之前的數據是與第一個文件完全相同。
所以我想輸出是這樣的:
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 49 packets received, 2% packet loss
round-trip min/avg/max = 20.917/70.216/147.258 ms
1466786342
PING 10.0.0.6 (10.0.0.6): 56 data bytes
....
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 50 packets received, 0% packet loss
round-trip min/avg/max = 29.535/65.768/126.983 ms
1466786391
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 44 packets received, 12% packet loss
round-trip min/avg/max = 30.238/62.772/102.959 ms
1466786442
PING 10.0.0.6 (10.0.0.6): 56 data bytes
....
也就是說,將兩者連接起來的文件,並創建第三個去除第二文件的副本(文字塊那是已經存在於第一個文件。這裏是我的代碼:
public static void UnionFiles()
{
string folderPath = Path.Combine(Path.GetDirectoryName(Assembly.GetEntryAssembly().Location), "http");
string outputFilePath = Path.Combine(Path.GetDirectoryName(Assembly.GetEntryAssembly().Location), "http\\union.dat");
var union = Enumerable.Empty<string>();
foreach (string filePath in Directory
.EnumerateFiles(folderPath, "*.txt")
.OrderBy(x => Path.GetFileNameWithoutExtension(x)))
{
union = union.Union(File.ReadAllLines(filePath));
}
File.WriteAllLines(outputFilePath, union);
}
這是錯誤的輸出我得到(文件結構被破壞):
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 49 packets received, 2% packet loss
round-trip min/avg/max = 20.917/70.216/147.258 ms
1466786342
PING 10.0.0.6 (10.0.0.6): 56 data bytes
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 50 packets received, 0% packet loss
round-trip min/avg/max = 29.535/65.768/126.983 ms
1466786391
round-trip min/avg/max = 30.238/62.772/102.959 ms
1466786442
round-trip min/avg/max = 5.475/40.986/96.964 ms
1466786492
round-trip min/avg/max = 5.276/61.309/112.530 ms
編輯:此代碼被編寫來處理多個文件,但是我很高興,即使只有2可以正確完成。
但是,這並不會刪除textblocks
,因爲它會刪除幾條有用的行,並使輸出完全無用。我被卡住了。
如何實現這一目標? 謝謝。
'工會= union.Union(File.ReadAllLines(文件路徑));'這應該不創建布爾結合,從而去除重複塊? –
是的,它應該,我假設格式(UTF8?)或空白問題? – Ouarzy
您需要實際_parse_文件並提取各個塊作爲Ouarzy建議的比較。其他一切都將導致醜陋,無法維護的黑客行爲。 –