2016-08-02 76 views
3

我正在使用從python 2.7的nltk樹包,我想從它的祖父節點的樹中提取每個規則。 我有以下的樹使用nltk找到祖父節點的節點

t = Tree('S', [Tree('NP', [Tree('D', ['the']), Tree('N', ['dog'])]), Tree('VP', [Tree('V', ['chased']), Tree('NP', [Tree('D', ['the']), Tree('N', ['cat'])])])]) 

和樹的作品

t.productions 
    [S -> NP VP, NP -> D N, D -> 'the', N -> 'dog', VP -> V NP, V -> 'chased', NP -> D N, D -> 'the', N -> 'cat'] 

   S    
     ________|_____   
     |    VP   
     |   _____|___  
     NP  |   NP  
    ___|___  |  ___|___ 
    D  N V  D  N 
    |  | |  |  | 
the  dog chased the  cat 

我要的是什麼形式的:

[S -> NP VP, S^NP -> D N, NP^D -> 'the', NP^N -> 'dog'.......] 

我看過一個t ParelaysTree類,但我沒有得到如何使用它來解決我的問題。

回答

1

您需要修改/覆蓋製作方法

代碼:

from nltk.tree import Tree 
from nltk.compat import string_types 
from nltk.grammar import Production, Nonterminal 
from nltk.tree import _child_names 

def productions(t, parent): 
    if not isinstance(t._label, string_types): 
     raise TypeError('Productions can only be generated from trees having node labels that are strings') 

    # t._label ==> parent + "^" + t._label 
    prods = [Production(Nonterminal(parent + "^" + t._label), _child_names(t))] 
    for child in t: 
     if isinstance(child, Tree): 
      prods += productions(child, t._label) 
    return prods 


t = Tree('S', [Tree('NP', [Tree('D', ['the']), Tree('N', ['dog'])]), Tree('VP', [Tree('V', ['chased']), Tree('NP', [Tree('D', ['the']), Tree('N', ['cat'])])])]) 

# To Add Parent of 'S' as 'Start' 
# prods = productions(t, "Start") 

# To Skip Parent of 'S' 
prods = [Production(Nonterminal(t._label), _child_names(t))] 
for child in t: 
    if isinstance(child, Tree): 
     prods += productions(child, t._label) 

print prods 

輸出:

[S -> NP VP, S^NP -> D N, NP^D -> 'the', 
    NP^N -> 'dog', S^VP -> V NP, VP^V -> 'chased', 
    VP^NP -> D N, NP^D -> 'the', NP^N -> 'cat'] 

有關詳細信息檢查nltk.treeproductions方法 - here