從nltk樹中獲取詞的深度

我正在處理一個nlp項目，我想根據它在依賴關係樹中的位置來篩選出單詞。從nltk樹中獲取詞的深度

要繪製我正在使用的代碼從這個post樹：

def to_nltk_tree(node): 

    if node.n_lefts + node.n_rights > 0: 
     return Tree(node.orth_, [to_nltk_tree(child) for child in node.children]) 
    else: 
     return node.orth_

對於一個例句：

「A組的世界各地的人們突然精神上聯繫」

我得到這棵樹：

從這個樹就是我想要得到的是樹中的單詞和相應的深度元組的列表：

[(linked,1),(are,2),(suddenly,2),(mentally,2),(group,2),(A,3),(of,3),(people,4)....]

對於這種情況，我在沒有孩子的話不感興趣： [是，突然，精神上，A，] 因此，我迄今能夠做到的只是獲得有兒童的單詞列表，因此我使用此代碼：

def get_words(root,words): 
    children = list(root.children) 
    for child in children: 
     if list(child.children): 
      words.append(child) 
      get_words(child,words) 
    return list(set(words) 

[to_nltk_tree(sent.root).pretty_print() for sent in doc.sents] 
s_root = list(doc.sents)[0].root 
words = [] 
words.append(s_root)  
words = get_words(s_root,words) 
words 

[around, linked, world, of, people, group]

從這我怎麼能得到所需的元組與單詞和其各自的深度？

來源

2016-11-20 Luis Ramon Ramirez Rodriguez

你確定這是你的代碼中的nltk Tree？ nltk的Tree類沒有children屬性。使用nltk Tree，你可以通過使用樹結構中的「樹結構」來完成你想要的任務。每條路徑都是分支選擇的元組。「people」的樹形結構爲(0, 2, 1, 0)，正如您所看到的，節點的深度只是其樹形結構的長度。

首先，我得到了樹葉的路徑，這樣我就可以排除它們：

t = nltk.Tree.fromstring("""(linked (are suddenly mentally 
            (group A (of (people (around (world the)))))))""") 
n_leaves = len(t.leaves()) 
leavepos = set(t.leaf_treeposition(n) for n in range(n_leaves))

現在可以很容易地列出非終端節點，它們的深度：

>>> for pos in t.treepositions(): 
     if pos not in leavepos: 
      print(t[pos].label(), len(pos)) 
linked 0 
are 1 
group 2 
of 3 
people 4 
around 5 
world 6

順便說一句，NLTK樹有自己的顯示方法。試試print(t)或t.draw()，它在彈出窗口中繪製樹。

來源

2016-11-22 01:21:35 alexis

我正在使用nltk從spaCy繪製依賴關係樹，這就是爲什麼它有「子」方法。 http://stackoverflow.com/questions/36610179/how-to-get-the-dependency-tree-with-spacy –

從nltk樹中獲取詞的深度

回答

相關問題