2017-04-25 34 views
0

我試圖獲得包含特定依賴關係的短語。例如,我要一個包含主題,名詞詞組,它作爲一個同位語等。例如名詞詞組:從依賴關係到短語

   Sentence: John Smith and Robert Alan Jones ate the warm pizza and cold salad by the car for an hour. 
     Phrasal Subject: John Smith and Robert Alan Jones 
       Negation: 
        Verbs: ate 
    Phrasal Direct Object: the warm pizza and cold salad 
Phrasal Indirect Object: 
       Root Noun: 
     Phrasal Root Noun: 
     Phrasal Appositive: 
Phrasal Subject Complement: 
Phrasal Object Complement: 
Phrasal Clausal Complement: 
     Adjective Phrase: warm 
     Adverbial Phrase: 
    Prepositional Phrases: [by the car, for an hour] 

再次 - 我使用的依存句法分析;我已經編寫了一些代碼來遞歸地導航TypedDependency集合......但它感覺很不舒服。是否有一種內置的方式可以用來從依賴關係中返回短語和文字組合(MWE,POSS等)? Jeff

回答

0

我認爲OpenIE系統對於獲取這樣的三元組很有用。

這是我寫的一個基本例子,可能有更好的方法。 containsNounPhrase方法可用於添加到樹中。也可能我會在斯坦福CoreNLP 3.8.0版本中增加一些這方面的內容。

package edu.stanford.nlp.examples; 

import edu.stanford.nlp.ling.*; 
import edu.stanford.nlp.pipeline.*; 
import edu.stanford.nlp.semgraph.*; 
import edu.stanford.nlp.trees.*; 
import edu.stanford.nlp.util.*; 

import java.util.*; 

public class PhraseDependencyExample { 

    public static Tree containingNounPhrase(Tree tree, Tree leaf) { 
    Tree currTree = leaf; 
    Tree largestNPTree = null; 
    while (currTree != null) { 
     if (currTree.label().value().equals("NP")) 
     largestNPTree = currTree; 
     currTree = currTree.parent(tree); 
    } 
    return largestNPTree; 
    } 

    public static void main(String[] args) { 
    // set up pipeline properties 
    Properties props = new Properties(); 
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse"); 
    // use faster shift reduce parser 
    props.setProperty("parse.model", "edu/stanford/nlp/models/srparser/englishSR.ser.gz"); 
    props.setProperty("parse.maxlen", "100"); 
    // set up Stanford CoreNLP pipeline 
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props); 
    // build annotation for a review 
    Annotation annotation = new Annotation("John Smith and Robert Alan Jones ate the warm pizza and cold salad."); 
    // annotate the review 
    pipeline.annotate(annotation); 
    for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) { 
     System.err.println("---"); 
     Tree sentenceConstituencyParse = sentence.get(TreeCoreAnnotations.TreeAnnotation.class); 
     System.err.println(sentenceConstituencyParse); 
     SemanticGraph sg = sentence.get(SemanticGraphCoreAnnotations.BasicDependenciesAnnotation.class); 
     for (IndexedWord iw : sg.vertexListSorted()) { 
     if (iw.tag().equals("VBD")) { 
      System.err.println("---"); 
      System.err.println("verb: "+iw.word()); 
      for (SemanticGraphEdge sge : sg.outgoingEdgeList(iw)) { 
      if (sge.getRelation().getShortName().equals("dobj") || sge.getRelation().getShortName().equals("nsubj")) { 
       int tokenIndex = sge.getDependent().backingLabel().index()-1; 
       String fullPhrase = containingNounPhrase(sentenceConstituencyParse, 
        sentenceConstituencyParse.getLeaves().get(tokenIndex)).yieldWords().toString(); 
       System.err.println("\t"+sge.getRelation() + " --> "+fullPhrase); 
      } 
      } 
     } 
     } 
    } 
    } 
} 
  1. 這段代碼演示瞭如何獲得選區樹的葉子和依賴解析圖的頂點。

  2. 它設置爲獲取包含這個詞的最大名詞短語,但你可以改變它來獲得最小的,等等