NLTK在python/RSS feed chunking中將子樹轉換爲列表

使用下面的代碼我分割已經標記和標記的rss提要。（'Prime'，'NNP'），（'Minister'，'NNP'），（'Stephen'，'NNP'），（（'Prime'，'NNP'），（''''''''''）） 'NNP'）] [（'US'，'NNP'），（'President'，'NNP'），（'Barack'，'NNP'），（'Obama'，'NNP'）（''''，'NNP'）] [（'Keystone'，'NNP'），（'XL'，'NNP'）] [（'CBC'，'NNP'），新聞'，'NNP'）]NLTK在python/RSS feed chunking中將子樹轉換爲列表

這看起來像一個Python列表，但我不知道如何直接訪問它或迭代它。我認爲這是一個子樹輸出。

我希望能夠把這個子樹變成我可以操作的列表。是否有捷徑可尋？這是我第一次遇到python中的樹，我迷路了。我想用這個名單來結束：

文檔=「哈珀總理」，「美國總統奧巴馬」，「什麼\」，「大號拱心石」，「CBC新聞」]

是有一個簡單的方法可以做到這一點？

謝謝，一如既往的幫助！

grammar = r""" Proper: {<NNP>+} """ 

cp = nltk.RegexpParser(grammar) 
result = cp.parse(posDocuments) 
nounPhraseDocs.append(result) 

for subtree in result.subtrees(filter=lambda t: t.node == 'Proper'): 
# print the noun phrase as a list of part-of-speech tagged words 

    print subtree.leaves() 
print" "

來源

2013-10-11 English Grad

docs = [] 

for subtree in result.subtrees(filter=lambda t: t.node == 'Proper'): 
    docs.append(" ".join([a for (a,b) in subtree.leaves()])) 

print docs

這應該做的伎倆。

來源

2013-10-16 11:25:39

node現在已被label取代。因此，修改的Viktor's答案：

docs = [] 

for subtree in result.subtrees(filter=lambda t: t.label() == 'Proper'): 
    docs.append(" ".join([a for (a,b) in subtree.leaves()]))

這會給你只有那些誰是Proper夾頭的一部分標記列表。您可以從subtrees()方法中刪除filter參數，您將獲得屬於樹的特定父項的所有令牌的列表。

來源

2017-01-16 09:56:46 Deathstroke

NLTK在python/RSS feed chunking中將子樹轉換爲列表

回答

相關問題