導航斯坦福CoreNLP解析結果

斯坦福核心NLP解析器生成句子以下的輸出：導航斯坦福CoreNLP解析結果

"He didn't get a reply" 

(ROOT 
(S 
(NP (PRP He)) 
(VP (VBD did) (RB n’t) 
(VP (VB get) 
(NP (DT a) (NN reply)))) 
(. .)))

我需要一種方法來輕鬆瀏覽它即額外的標籤，發現孩子和家長。目前我正在手動進行（統計括號）。我想知道是否有一個Python庫可以爲我進行括號計數，或者甚至更好，比如Beautiful Soup或Scrapy會讓我使用對象。

如果沒有工具，遍歷句子並獲取所有標籤的最佳方法是什麼？我猜我需要創建一些標籤對象與包含兒童標籤對象的列表。

來源

2017-06-21 user1700890

這看起來像LISP。編寫一個Lisp程序來遍歷它並提取你想要的內容似乎很容易。

你也可以將它轉換成一個列表，Python和過程中的Python：

from pyparsing import OneOrMore, nestedExpr 
nlpdata = '(ROOT (S (NP (PRP He)) (VP (VBD did) (RB n\'t) (VP (VB get) (NP (DT a) (NN reply)))) (. .)))' 
data = OneOrMore(nestedExpr()).parseString(nlpdata) 
print data 
# [['ROOT', ['S', ['NP', ['PRP', 'He']], ['VP', ['VBD', 'did'], ['RB', "n't"], ['VP', ['VB', 'get'], ['NP', ['DT', 'a'], ['NN', 'reply']]]], ['.', '.']]]]

注意，我必須逃脫報價中的「不」

來源

2017-06-21 02:21:00 mikep

我的方式來瀏覽輸出不是試圖解析字符串，而是建立一個對象並反序列化。然後你可以在本地使用該對象。

問題中顯示的輸出是使用名爲「prettyPrint」的管道上的選項生成的。我將其改爲「jsonPrint」來取代JSON輸出。然後我可以獲取輸出並從中生成一個類（VS可以通過粘貼特殊選項從JSON生成一個類，或者有像http://json2csharp.com/這樣的在線資源）。生成的類如下所示：

public class BasicDependency 
    { 
     public string dep { get; set; } 
     public int governor { get; set; } 
     public string governorGloss { get; set; } 
     public int dependent { get; set; } 
     public string dependentGloss { get; set; } 
    } 

    public class EnhancedDependency 
    { 
     public string dep { get; set; } 
     public int governor { get; set; } 
     public string governorGloss { get; set; } 
     public int dependent { get; set; } 
     public string dependentGloss { get; set; } 
    } 

    public class EnhancedPlusPlusDependency 
    { 
     public string dep { get; set; } 
     public int governor { get; set; } 
     public string governorGloss { get; set; } 
     public int dependent { get; set; } 
     public string dependentGloss { get; set; } 
    } 

    public class Token 
    { 
     public int index { get; set; } 
     public string word { get; set; } 
     public string originalText { get; set; } 
     public string lemma { get; set; } 
     public int characterOffsetBegin { get; set; } 
     public int characterOffsetEnd { get; set; } 
     public string pos { get; set; } 
     public string ner { get; set; } 
     public string speaker { get; set; } 
     public string before { get; set; } 
     public string after { get; set; } 
     public string normalizedNER { get; set; } 
    } 

    public class Sentence 
    { 
     public int index { get; set; } 
     public string parse { get; set; } 
     public List<BasicDependency> basicDependencies { get; set; } 
     public List<EnhancedDependency> enhancedDependencies { get; set; } 
     public List<EnhancedPlusPlusDependency> enhancedPlusPlusDependencies { get; set; } 
     public List<Token> tokens { get; set; } 
    } 

    public class RootObject 
    { 
     public List<Sentence> sentences { get; set; } 
    }

*注意：不幸的是，這種技術對於coref註釋並不適用。 JSON沒有正確轉換爲類。我現在正在處理這個問題。該模型是使用註釋器「tokenize，ssplit，pos，引理，ner，parse」從輸出中構建的。

我的代碼，只能從樣本代碼略有改變，看起來是這樣的（注意「pipeline.jsonPrint」）：

public static string LanguageAnalysis(string sourceText) 
     { 
      string json = ""; 
      // Path to the folder with models extracted from stanford-corenlp-3.7.0-models.jar 
      var jarRoot = @"..\..\..\..\packages\Stanford.NLP.CoreNLP.3.7.0.1\"; 

      // Annotation pipeline configuration 
      var props = new Properties(); 
      props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse"); 
      props.setProperty("ner.useSUTime", "0"); 

      // We should change current directory, so StanfordCoreNLP could find all the model files automatically 
      var curDir = Environment.CurrentDirectory; 
      Directory.SetCurrentDirectory(jarRoot); 
      var pipeline = new StanfordCoreNLP(props); 
      Directory.SetCurrentDirectory(curDir); 

      // Annotation 
      var annotation = new Annotation(sourceText); 
      pipeline.annotate(annotation); 

      // Result - JSON Print 
      using (var stream = new ByteArrayOutputStream()) 
      { 
       pipeline.jsonPrint(annotation, new PrintWriter(stream)); 
       json = stream.toString(); 
       stream.close(); 
      } 

      return json; 
     }

這似乎與這樣的代碼很好地反序列化：

using Newtonsoft.Json; 
string sourceText = "My text document to parse."; 
string json = Analysis.LanguageAnalysis(sourceText); 
RootObject document = JsonConvert.DeserializeObject<RootObject>(json);

來源

2017-06-22 05:51:49

現在我正在處理結果對象，我意識到我的答案實際上並沒有回答這個問題！解析器輸出仍以與一個字符串相同的格式提供，稱爲「解析」。我現在加入@ user1700890尋找解析它的方法！ –

進一步看，我看到這個問題，它似乎是相同的，並有一個答案使用PHP：[PHP和NLP：嵌套括號（解析器輸出）數組？]（https://stackoverflow.com/questions/7917161/PHP-和NLP-嵌套括號解析器輸出到陣列） –

導航斯坦福CoreNLP解析結果

回答

相關問題