C＃：解析文本文件

我的文本文件，文件的內容是這樣的：C＃：解析文本文件

idiom: meaning 
description. 
o example1. 
o example2. 

idiom: meaning 
description. 
o example1. 
o example2. 

. 
. 
.

，你可以看到該文件包含上面的段落，每個段落有一些數據，我想提取（注該示例以o開頭）。例如我們有這些數據：

public class Idiom 
{ 
    public string Idiom { get; set; } 
    public string Meaning { get; set; } 
    public string Description { get; set; } 
    public IList<IdiomExample> IdiomExamples { get; set; } 
} 

public class IdiomExample 
{ 
    public string Item { get; set; } 
}

是否有任何方法來提取該文件中的字段？任何想法？

編輯
該文件可以是任何東西，像成語和動詞，...是例子，這只是我的示例模板：提前

little by little: gradually, slowly (also: step by step) 
o Karen's health seems to be improving little by little. 
o If you study regularly each day, step by step your vocabulary will increase. 
to tire out: to make very weary due to difficult conditions or hard effort (also: to wear out) (S) 
o The hot weather tired out the runners in the marathon. 
o Does studying for final exams wear you out? It makes me feel worn out!

感謝

來源

2014-05-05 Sirwan Afifi

是的，有辦法。你有嘗試過什麼嗎？ –

其實不，我對發現正則表達式的模式有點困惑。 –

含義總是單行嗎？ –

像這樣的東西應該工作。我沒有測試過，但有一點調試，我猜它會起作用。

我知道你把regex放在標籤中，但這也是一種提取線條的方法。

using (var textReader = new StreamReader("idioms.txt")) 
{ 
    var idioms = new List<Idiom>(); 
    string line; 
    while ((line = textReader.ReadLine()) != null) 
    { 
     var idiom = new Idiom(); 
     if (line.StartsWith("idiom: ")) 
     { 
      idiom.Meaning = line.Replace("idiom: ", string.Empty); 
      idiom.Description = textReader.ReadLine(); 

      while ((line = textReader.ReadLine()) != null) 
      { 
       if (line.StartsWith("o ")) 
        idiom.IdiomExamples.Add(new IdiomExample { Item = line.Replace("o ", string.Empty) }); 
       else break; 
      } 
      idioms.Add(idiom); 
     } 
    } 

    ///idioms ready 
}

來源

2014-05-05 06:29:00

沿着這些線（沒有測試它，這只是一個建議）

RegEx r = new RegEx(@"Idiom:([^\n]+)\n([^o]+)(o([^o]+)o)*");

來源

2014-05-05 06:25:48

我的文本文件中沒有'Idiom'，只是舉例而已。它可以是任何東西。 –

你是否期待我們猜測？從你的聲譽來看，你應該知道如何提問。請編輯您的問題以顯示**相關的**輸入。 –

這是我對你的問題的正則表達式：

(?<section>(?<idiom>^.+?):(?<meaning>.+)[\n](?<description>.*?)(?<examples>(?<example>o.+[\s\r\n])+))

我測試了一點點，但我認爲你必須解決一些小問題。一般來說，它運作良好。

設置爲這個表達式：

RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace | RegexOptions.CultureInvariant

好了，你有3種方式與你的文件的工作。首先是使用正則表達式，它是開發速度最快，性能解決方案最慢的。第二個是將你的文本解析爲字符串，並使用LINQ或任何你想要的。對於我來說，這種方法是錯誤的，不可擴展的等等，但它具有更好的性能，如果您處理非常大的文件，這可能至關重要。第三個是使用正式的語法和終端機或類似的東西......我從來沒有實現過這樣的東西，但我知道，它開發和維護速度快，很難，所以我建議你使用正則表達式和然後遷移到另一種方法，如果性能將成爲您的瓶頸

希望這會有所幫助！

來源

2014-05-05 07:01:09

您的示例沒有說明，但此正則表達式接受可選說明。它讓你瞭解如何解析你的輸入，而不是整個C＃代碼。

看到這裏this demo，並期待在論壇

(?smx) 
^ 
([^:\n]+):\s*([^\n]+) 
\n([^o].*?\n|) 
(^o.*?) 
(?=\Z|^[^o:\n]+:)

在此之後：

組＃1成語
組＃2意思
集團＃3有說明如果存在
第4組具有所有的例子

此正則表達式不分析你的例子分成幾個例子，那就是下一個工作。你也可能不喜歡一些換行符。

來源

2014-05-05 07:23:41

非常感謝，但它似乎只適用於一個段落。 http://regex101.com/r/iK6zP3#pcre –

@SirwanAfifi改變它爲多paragrafs。嘗試在你的C＃中，不在regex101中工作，但在我的工具中工作，regex101有時會出現錯誤 –

C＃：解析文本文件

回答

相關問題