NLTK子句和短語故障

有沒有辦法讓NLTK返回完全標有所有Treebank子句和Treebank短語分界（或等價物;它不必是Treebank）的文本？我需要能夠返回條款和短語（分開）。我發現的唯一一件事是在第七章的NLTK Bird/Klein/Loper書中，它說你不能同時處理名詞短語和動詞短語，但我想做的遠不止這些！我認爲斯坦福POS解析器會這樣做，但客戶端只想使用NLTK。謝謝。NLTK子句和短語故障

來源

2012-08-15 AL Durstenfeld

你看過第8章了嗎？這聽起來像你想要的東西：

>>> from nltk.corpus import treebank 
>>> t = treebank.parsed_sents('wsj_0001.mrg')[0] 
>>> print t 
(S 
    (NP-SBJ 
    (NP (NNP Pierre) (NNP Vinken)) 
    (, ,) 
    (ADJP (NP (CD 61) (NNS years)) (JJ old)) 
    (, ,)) 
    (VP 
    (MD will) 
    (VP 
     (VB join) 
     (NP (DT the) (NN board)) 
     (PP-CLR 
     (IN as) 
     (NP (DT a) (JJ nonexecutive) (NN director))) 
     (NP-TMP (NNP Nov.) (CD 29)))) 
    (. .))

除了你已經找到的分塊資源。但是，如果你的意思是你要分析你提供的文字，也有類似的選項：

>>> sr_parse = nltk.ShiftReduceParser(grammar1) 
>>> sent = 'Mary saw a dog'.split() 
>>> print sr_parse.parse(sent) 
(S (NP Mary) (VP (V saw) (NP (Det a) (N dog))))

，但是這依賴於grammar1被填充手動事先。分塊比解析更容易。

來源

2012-08-15 02:05:14 verbsintransit

NLTK子句和短語故障

回答

相關問題