C＃模板解析和文本文件匹配

需要一些想法如何解決這個問題。我有一個模板文件描述文本文件中的行。例如：C＃模板解析和文本文件匹配

模板

[%f1%]|[%f2%]|[%f3%]"[%f4%]"[%f5%]"[%f6%]

文本文件

1234|1234567|123"12345"12"123456

現在我需要從文本文件中的字段讀取。在模板文件中，字段用[%some name%]來描述。 Allso在模板文件中設置了字段分隔符，在這個示例中這裏有|和"。字段的長度可以通過不同的文件改變，但分隔符將保持不變。在文本文件中讀取模板和讀取模板的最佳方式是什麼？

編輯：文本文件有多個行，像這樣：

1234|1234567|123"12345"12"123456"\r\n 
1234|field|123"12345"12"asdasd"\r\n 
123sd|1234567|123"asdsadf"12"123456"\r\n 
45gg|somedata|123"12345"12"somefield"\r\n

EDIT2：好吧，讓使它更難。有些字段可以包含二進制數據，我知道二進制數據字段的起始和結束位置。我應該能夠在模板中標記這些字段，然後解析器會知道這個字段是二進制的。如何解決這個問題呢？

來源

2011-06-25 hs2d

字段值僅爲十進制數字嗎？ –

@HalfTrackMindMan：不，字段值可以是任何東西，有時甚至是二進制。 – hs2d

我將創建一個基於正則表達式然後使用該模板解析文本文件：

class Parser 
{ 
    private static readonly Regex TemplateRegex = 
     new Regex(@"\[%(?<field>[^]]+)%\](?<delim>[^[]+)?"); 

    readonly List<string> m_fields = new List<string>(); 
    private readonly Regex m_textRegex; 

    public Parser(string template) 
    { 
     var textRegexString = '^' + TemplateRegex.Replace(template, Evaluator) + '$'; 
     m_textRegex = new Regex(textRegexString); 
    } 

    string Evaluator(Match match) 
    { 
     // add field name to collection and create regex for the field 
     var fieldName = match.Groups["field"].Value; 
     m_fields.Add(fieldName); 
     string result = "(.*?)"; 

     // add delimiter to the regex, if it exists 
     // TODO: check, that only last field doesn't have delimiter 
     var delimGroup = match.Groups["delim"]; 
     if (delimGroup.Success) 
     { 
      string delim = delimGroup.Value; 
      result += Regex.Escape(delim); 
     } 
     return result; 
    } 

    public IDictionary<string, string> Parse(string text) 
    { 
     var match = m_textRegex.Match(text); 
     var groups = match.Groups; 

     var result = new Dictionary<string, string>(m_fields.Count); 

     for (int i = 0; i < m_fields.Count; i++) 
      result.Add(m_fields[i], groups[i + 1].Value); 

     return result; 
    } 
}

來源

2011-06-25 19:24:59 svick

這是一個非常好的主意，但我不確定它是否適合我。 – hs2d

@ hs2d，爲什麼不呢？ – svick

@svick，我的意思是我已經嘗試適應，然後我可以告訴它是否適合我。但我認爲它是最好的想法呢。 – hs2d

我會用幾行代碼來做到這一點。循環訪問您的模板行，獲取「[」作爲變量名稱和所有其他作爲終結符的文本。將所有文本讀取到終端，將其分配給變量名稱，重複。

來源

2011-06-25 18:06:18

您可以使用正則表達式解析模板。像這樣的表達式將每個字段定義和分離器相匹配：

Match m = Regex.Match(template, @"^(\[%(?<name>.+?)%\](?<separator>.)?)+$")

匹配將包含兩個命名組（名稱和隔板），其每一個將包含許多捕獲的每次在輸入匹配時間串。在你的例子中，分隔符組比捕獲名稱組的捕獲少一個。

然後，您可以遍歷捕獲，並使用結果來提取輸入字符串中的字段和存儲的值，如：

if(m.Success) 
{ 
    Group name = m.Groups["name"]; 
    Group separator = m.Groups["separator"]; 
    int index = 0; 
    Dictionary<string, string> fields = new Dictionary<string, string>(); 
    for(int x = 0; x < name.Captures.Count; ++x) 
    { 
     int separatorIndex = input.Length; 
     if(x < separator.Captures.Count) 
      separatorIndex = input.IndexOf(separator.Captures[x].Value, index); 
     fields.Add(name.Captures[x].Value, input.Substring(index, separatorIndex - index)); 
     index = separatorIndex + 1; 
    } 
    // Do something with results. 
}

很顯然，在你不得不考慮一個真正的程序對於無效輸入等，我沒有在這裏做。

來源

2011-06-25 18:07:05 Sven

我不擅長使用正則表達式，但你試過，如果該正則表達式應該工作？ – hs2d

是的，我測試了該代碼，它與您的示例數據一起工作。 – Sven

1爲sscanf(line, format, __arglist)檢查使用API here

2-使用字符串分割像：

public IEnumerable<int> GetDataFromLines(string[] lines) 
{ 
    //handle the output data 
    List<int> data = new List<int>(); 

    foreach (string line in lines) 
    { 
     string[] seperators = new string[] { "|", "\"" }; 

     string[] results = line.Split(seperators, StringSplitOptions.RemoveEmptyEntries); 

     foreach (string result in results) 
     { 
      data.Add(int.Parse(result)); 
     } 
    } 

    return data; 
}

與線測試：

line = "1234|1234567|123\"12345\"12\"123456"; 
string[] lines = new string[] { line }; 

GetDataFromLines(lines); 

//output list items are: 
1234 
1234567 
123 
12345 
12 
123456

來源

2011-06-25 18:16:26

@ hs2d：你是否嘗試過像上面這個例子那樣的字符串？ –

這不會爲我工作，因爲在解析文本文件後，我需要知道字段的順序和它們之間的分隔符。 – hs2d

C＃模板解析和文本文件匹配

回答

相關問題