如何在斯坦福依賴關係解析器中保持標點符號

我正在使用斯坦福CoreNLP（01.2016版本），並且希望將標點符號保留在依賴關係中。從命令行運行時，我發現了一些方法，但是我沒有發現任何關於提取依賴關係的java代碼。如何在斯坦福依賴關係解析器中保持標點符號

這是我目前的代碼。它的工作原理，但沒有包括標點符號：

Annotation document = new Annotation(text); 

     Properties props = new Properties(); 

     props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse"); 

     props.setProperty("ssplit.newlineIsSentenceBreak", "always"); 

     props.setProperty("ssplit.eolonly", "true"); 

     props.setProperty("pos.model", modelPath1); 

     props.put("parse.model", modelPath); 

     StanfordCoreNLP pipeline = new StanfordCoreNLP(props); 

     pipeline.annotate(document); 

     LexicalizedParser lp = LexicalizedParser.loadModel(modelPath + lexparserNameEn, 

       "-maxLength", "200", "-retainTmpSubcategories"); 

     TreebankLanguagePack tlp = new PennTreebankLanguagePack(); 

     GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory(); 

     List<CoreMap> sentences = document.get(SentencesAnnotation.class); 

     for (CoreMap sentence : sentences) { 

      List<CoreLabel> words = sentence.get(CoreAnnotations.TokensAnnotation.class);    

      Tree parse = lp.apply(words); 

      GrammaticalStructure gs = gsf.newGrammaticalStructure(parse); 
      Collection<TypedDependency> td = gs.typedDependencies(); 

      parsedText += td.toString() + "\n";

任何一種依賴關係對我來說是好的，基本的，類型化，坍塌等我只想包括標點符號。

由於提前，

來源

2016-05-10 user1419243

您在這裏做了不少額外的工作，因爲你一旦通過CoreNLP通過調用lp.apply(words)運行分析器，然後再次。

通過使用CoreNLP選項parse.keepPunct獲得具有標點符號的依賴關係樹/圖的最簡單方法如下。

Annotation document = new Annotation(text); 
Properties props = new Properties(); 
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse"); 
props.setProperty("ssplit.newlineIsSentenceBreak", "always"); 
props.setProperty("ssplit.eolonly", "true"); 
props.setProperty("pos.model", modelPath1); 
props.setProperty("parse.model", modelPath); 
props.setProperty("parse.keepPunct", "true"); 

StanfordCoreNLP pipeline = new StanfordCoreNLP(props); 

pipeline.annotate(document); 

for (CoreMap sentence : sentences) { 
    //Pick whichever representation you want 
    SemanticGraph basicDeps = sentence.get(SemanticGraphCoreAnnotations.BasicDependenciesAnnotation.class); 
    SemanticGraph collapsed = sentence.get(SemanticGraphCoreAnnotations.CollapsedDependenciesAnnotation.class); 
    SemanticGraph ccProcessed = sentence.get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class); 
}

句子註釋對象將依賴關係樹/圖存儲爲SemanticGraph。如果您想要一個TypedDependency對象的列表，請使用方法typedDependencies()。例如，

List<TypedDependency> dependencies = basicDeps.typedDependencies();

來源

2016-05-11 00:50:58

由於'setProperty'只接受'String，String'，因此最後一個'true'必須''true''' – peer

如何在斯坦福依賴關係解析器中保持標點符號

回答

相關問題