單詞的名詞，動詞，形容詞等的單獨列表

-3

我想解析一行並提取在Wordnet database 中找到的單詞，但我不知道該怎麼做。例如，index.adj文件包含以下行：單詞的名詞，動詞，形容詞等的單獨列表

abactinal a 1 1 ! 1 0 01665972 
abandoned a 2 1 & 2 1 01313004 01317231 
abashed a 1 1 & 1 1 00531628 
abasic a 1 2 \ + 1 0 02598608 
abatable a 1 2 & + 1 0 02288022 
abatic a 1 2 \ + 1 0 02598608 
abaxial a 1 2 ! ; 1 0 00002312 
abbatial a 1 2 \ + 1 0 02598768 
abbreviated a 2 1 & 2 1 01436432 01442597 
abdicable a 1 2 & + 1 0 02528048 
abdominal a 1 2 \ + 1 1 02934594 
abdominous a 1 2 & + 1 0 00986457

我使用.NET和C＃，我曾嘗試：

Regex regex = new Regex(@"/^(\S+?)[\s%]/"); 
Match match = regex.Match(line);

我找字典數據庫創建數據挖掘工具。

來源

2015-05-14 jobinelv

究竟什麼是你想在該字符串相匹配？您擁有的正則表達式是一種JavaScript風格的正則表達式，在C＃中無法按預期工作。如果您打算匹配單詞，我會使用'@「\ b \ p {L} + \ b」'正則表達式並使用'RegexMatches'來返回字符串中的單詞集合。 –

對不起，我從文件中發佈了錯誤的文本，可以找到我現在添加的行的正則表達式。有些單詞包含_也 – jobinelv

這看起來像是一個空格分隔列表給我。爲什麼你需要正則表達式？ –

由於此輸入是簡單的（白色）空格分隔文本，因此您不需要使用正則表達式來完成此任務。使用此代碼：

var txt5 = "abactinal a 1 1 ! 1 0 01665972\r\nabandoned a 2 1 & 2 1 01313004 01317231\r\nabandon v 2 1 & 2 1 01313004 01317231 "; 
var dic = new List<KeyValuePair<string, string>>(); 
var lines = txt5.Split(new string[] {"\r\n"}, StringSplitOptions.RemoveEmptyEntries); 
foreach (var line in lines) 
{ 
    var cells = line.Split(); 
    switch (cells[1]) 
    { 
     case "a": 
      dic.Add(new KeyValuePair<string, string>("adjective", cells[0])); 
      break; 
     case "v": 
      dic.Add(new KeyValuePair<string, string>("verb", cells[0])); 
      break; 
     // Add more to cover all POS values 
     default: 
      break; 
     } 
}

您可以調整它並進一步工作。

輸出：

enter image description here

來源

2015-05-14 12:58:44

單詞的名詞，動詞，形容詞等的單獨列表

回答

相關問題