2013-01-17 67 views
3

如果我想獲取對應每個單詞的短語標記,我該如何得到它?如何在斯坦福CoreNLP中獲取短語標籤?

例如:

在這句話中,

我的狗也喜歡吃香腸。

我能在斯坦福NLP解析樹如

(ROOT (S (NP (PRP$ My) (NN dog)) (ADVP (RB also)) (VP (VBZ likes) (NP (JJ eating) (NN sausage))) (. .))) 

在上面situtation,我想那句標記對應於像

(My - NP), (dog - NP), (also - ADVP), (likes - VP), ... 

每個字有什麼方法對於這個簡單的提取詞組標籤?

請幫幫我。

回答

2
//I guess this is how you get your parse tree. 
Tree tree = sentAnno.get(TreeAnnotation.class); 

//The children of a Tree annotation is an array of trees. 
Tree[] children = parent.children() 

//Check the label of any sub tree to see whether it is what you want (a phrase) 
for (Tree child: children){ 
    if (child.value().equals("NP")){// set your rule of defining Phrase here 
      List<Tree> leaves = child.getLeaves(); //leaves correspond to the tokens 
      for (Tree leaf : leaves){ 
      List<Word> words = leaf.yieldWords(); 
      for (Word word: words) 
       System.out.print(String.format("(%s - NP),",word.word())); 
      } 
    } 
} 

該代碼沒有完全測試,但我認爲它大致做你所需要的。更重要的是,我沒有寫關於遞歸訪問子樹的任何信息,但我相信你應該能夠做到這一點。