2015-02-23 127 views
0

我使用nltk.tree.Tree來讀取基於選區的分析樹。我需要找到我需要移動的節點的路徑,以從樹中的特定單詞到另一個單詞。在nltk.tree.Tree中查找路徑

一個簡單的例子:

這是sentece的解析樹「看見狗」:

(VP (VERB saw) (NP (DET the) (NOUN dog))) 

如果我想的話thedog之間的路徑,這將是:​​。

我甚至不知道如何開始:我如何找到樹葉的值?我怎麼能找到一個假的/節點的父母?

謝謝。

回答

1

下面是代碼:

def get_lca_length(location1, location2): 
    i = 0 
    while i < len(location1) and i < len(location2) and location1[i] == location2[i]: 
     i+=1 
    return i 

def get_labels_from_lca(ptree, lca_len, location): 
    labels = [] 
    for i in range(lca_len, len(location)): 
     labels.append(ptree[location[:i]].label()) 
    return labels 

def findPath(ptree, text1, text2): 
    leaf_values = ptree.leaves() 
    leaf_index1 = leaf_values.index(text1) 
    leaf_index2 = leaf_values.index(text2) 

    location1 = ptree.leaf_treeposition(leaf_index1) 
    location2 = ptree.leaf_treeposition(leaf_index2) 

    #find length of least common ancestor (lca) 
    lca_len = get_lca_length(location1, location2) 

    #find path from the node1 to lca 

    labels1 = get_labels_from_lca(ptree, lca_len, location1) 
    #ignore the first element, because it will be counted in the second part of the path 
    result = labels1[1:] 
    #inverse, because we want to go from the node to least common ancestor 
    result = result[::-1] 

    #add path from lca to node2 
    result = result + get_labels_from_lca(ptree, lca_len, location2) 
    return result 

ptree = ParentedTree.fromstring("(VP (VERB saw) (NP (DET the) (NOUN dog)))") 
print(ptree.pprint()) 
print(findPath(ptree, 'the', "dog")) 

它是基於樹列表表示,看到here。還請檢查similarquestions