我會使用正則表達式(希望不會得到與上次一樣多的降薪);)。我利用了backreferences這基本上允許使用以前捕獲的組。只要每行使用相同的分隔符,您也可以在同一個文件中有不同的分隔符(不知道它是否有用)。
所以,我這是怎麼建立的正則表達式:
string csvItem = @"[""']?\w+[""']?";
string separator = @"\s*[,\.;-]\s*";
string pattern = string.Format(@"^({0}(?<sep>{1}){0})+(\k<sep>{0})*$",
csvItem, separator);
csvItem是在CSV項目(列)。它可以包含小寫或大寫字母,數字和下劃線,並可以選擇性地用「或」包圍。
分隔符分隔項目。它由這些字符中的一個組成。。 - - 零個或多個間隔字符。
的圖案表示,有效線由通過分離器分離的至少兩個csvItems注意反向引用 - > \ķ
這這是測試文件的內容:
field1; field2; field3; field4
field1; field2; field3; field4
feld1, field2, field3, field4
feld1, field2, field3, field4
"field1", "field2", "field3", "field4"
"field1", "field2", "field3", "field4"
。
並且採樣器樂控制檯項目:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Text.RegularExpressions;
namespace csvParser {
class Program {
static void Main(string[ ] args) {
var lines = File.ReadAllLines(@"e:\prova.csv");
for (int i = 0; i < lines.Length; i++) {
string csvItem = @"[""']?\w+[""']?";
string separator = @"\s*[,\.;-]\s*";
string pattern = string.Format(@"^({0}(?<sep>{1}){0})+(\k<sep>{0})*$", csvItem, separator);
var rex = new Regex(pattern, RegexOptions.Singleline);
var match = rex.Match(lines[ i ]);
if (match == null) {
Console.WriteLine("No match on line {0}", i);
continue;
}
else {
string sep = match.Groups[ "sep" ].Value;
Console.WriteLine("--- Line #{0} ---------------", i);
Console.WriteLine("Line is '{0}'", lines[ i ]);
Console.WriteLine("Separator is '{0}'", sep);
Console.WriteLine("Items are:");
foreach (string item in lines[ i ].Split(sep))
Console.WriteLine("\t'{0}'", item);
Console.WriteLine();
}
}
Console.ReadKey();
}
}
public static partial class Extension {
public static string[ ] Split(this string str, string sep) {
return str.Split(new string[ ] { sep }, StringSplitOptions.RemoveEmptyEntries);
}
}
}
最後輸出:
--- Line #0 ---------------
Line is 'field1; field2; field3; field4'
Separator is '; '
Items are:
'field1'
'field2'
'field3'
'field4'
--- Line #1 ---------------
Line is 'field1; field2; field3; field4'
Separator is '; '
Items are:
'field1'
'field2'
'field3'
'field4'
--- Line #2 ---------------
Line is ''
Separator is ''
Items are:
--- Line #3 ---------------
Line is 'feld1, field2, field3, field4'
Separator is ', '
Items are:
'feld1'
'field2'
'field3'
'field4'
--- Line #4 ---------------
Line is 'feld1, field2, field3, field4'
Separator is ', '
Items are:
'feld1'
'field2'
'field3'
'field4'
--- Line #5 ---------------
Line is ''
Separator is ''
Items are:
--- Line #6 ---------------
Line is '"field1", "field2", "field3", "field4"'
Separator is ', '
Items are:
'"field1"'
'"field2"'
'"field3"'
'"field4"'
--- Line #7 ---------------
Line is '"field1", "field2", "field3", "field4"'
Separator is ', '
Items are:
'"field1"'
'"field2"'
'"field3"'
'"field4"'
不幸的是,正則表達式捕捉空行了。試圖修復它:)
謝謝,這是f * cking真棒方法! – Ruslan 2012-07-18 15:59:14
然而,你的方法需要預定義的可能分隔符列表..我想有一個方法,將調用給定文件的大多數可能的分隔符。 – Ruslan 2012-07-18 16:07:23
@Ruslan:恩,我覺得這很難做到。你至少應該知道你正在尋找什麼樣的分隔符或者它們包含什麼字符。當csv用雙重空間和空間格式化時, – BlackBear 2012-07-18 16:35:07