已經有許多MaltParser和/或NLTK相關的問題:解析多個句子用NLTK
- Malt Parser throwing class not found exception
- How to use malt parser in python nltk
- MaltParser Not Working in Python NLTK
- NLTK MaltParser won't parse
- Dependency parser using NLTK and MaltParser
- Dependency Parsing using MaltParser and NLTK
- Parsing with MaltParser engmalt
- Parse raw text with MaltParser in Java
現在,有MaltParser API的NLTK更穩定的版本:https://github.com/nltk/nltk/pull/944但也有問題,當談到同時分析多個句子。
同時解析了一句似乎罰款:
_path_to_maltparser = '/home/alvas/maltparser-1.8/dist/maltparser-1.8/'
_path_to_model= '/home/alvas/engmalt.linear-1.7.mco'
>>> mp = MaltParser(path_to_maltparser=_path_to_maltparser, model=_path_to_model)
>>> sent = 'I shot an elephant in my pajamas'.split()
>>> sent2 = 'Time flies like banana'.split()
>>> print(mp.parse_one(sent).tree())
(pajamas (shot I) an elephant in my)
但分析句子的列表不返回DependencyGraph對象:
_path_to_maltparser = '/home/alvas/maltparser-1.8/dist/maltparser-1.8/'
_path_to_model= '/home/alvas/engmalt.linear-1.7.mco'
>>> mp = MaltParser(path_to_maltparser=_path_to_maltparser, model=_path_to_model)
>>> sent = 'I shot an elephant in my pajamas'.split()
>>> sent2 = 'Time flies like banana'.split()
>>> print(mp.parse_one(sent).tree())
(pajamas (shot I) an elephant in my)
>>> print(next(mp.parse_sents([sent,sent2])))
<listiterator object at 0x7f0a2e4d3d90>
>>> print(next(next(mp.parse_sents([sent,sent2]))))
[{u'address': 0,
u'ctag': u'TOP',
u'deps': [2],
u'feats': None,
u'lemma': None,
u'rel': u'TOP',
u'tag': u'TOP',
u'word': None},
{u'address': 1,
u'ctag': u'NN',
u'deps': [],
u'feats': u'_',
u'head': 2,
u'lemma': u'_',
u'rel': u'nn',
u'tag': u'NN',
u'word': u'I'},
{u'address': 2,
u'ctag': u'NN',
u'deps': [1, 11],
u'feats': u'_',
u'head': 0,
u'lemma': u'_',
u'rel': u'null',
u'tag': u'NN',
u'word': u'shot'},
{u'address': 3,
u'ctag': u'AT',
u'deps': [],
u'feats': u'_',
u'head': 11,
u'lemma': u'_',
u'rel': u'nn',
u'tag': u'AT',
u'word': u'an'},
{u'address': 4,
u'ctag': u'NN',
u'deps': [],
u'feats': u'_',
u'head': 11,
u'lemma': u'_',
u'rel': u'nn',
u'tag': u'NN',
u'word': u'elephant'},
{u'address': 5,
u'ctag': u'NN',
u'deps': [],
u'feats': u'_',
u'head': 11,
u'lemma': u'_',
u'rel': u'nn',
u'tag': u'NN',
u'word': u'in'},
{u'address': 6,
u'ctag': u'NN',
u'deps': [],
u'feats': u'_',
u'head': 11,
u'lemma': u'_',
u'rel': u'nn',
u'tag': u'NN',
u'word': u'my'},
{u'address': 7,
u'ctag': u'NNS',
u'deps': [],
u'feats': u'_',
u'head': 11,
u'lemma': u'_',
u'rel': u'nn',
u'tag': u'NNS',
u'word': u'pajamas'},
{u'address': 8,
u'ctag': u'NN',
u'deps': [],
u'feats': u'_',
u'head': 11,
u'lemma': u'_',
u'rel': u'nn',
u'tag': u'NN',
u'word': u'Time'},
{u'address': 9,
u'ctag': u'NNS',
u'deps': [],
u'feats': u'_',
u'head': 11,
u'lemma': u'_',
u'rel': u'nn',
u'tag': u'NNS',
u'word': u'flies'},
{u'address': 10,
u'ctag': u'NN',
u'deps': [],
u'feats': u'_',
u'head': 11,
u'lemma': u'_',
u'rel': u'nn',
u'tag': u'NN',
u'word': u'like'},
{u'address': 11,
u'ctag': u'NN',
u'deps': [3, 4, 5, 6, 7, 8, 9, 10],
u'feats': u'_',
u'head': 2,
u'lemma': u'_',
u'rel': u'dep',
u'tag': u'NN',
u'word': u'banana'}]
這是爲什麼使用parse_sents()
不返回一個可迭代的parse_one
?
我可以然而,只是偷懶,做:
_path_to_maltparser = '/home/alvas/maltparser-1.8/dist/maltparser-1.8/'
_path_to_model= '/home/alvas/engmalt.linear-1.7.mco'
>>> mp = MaltParser(path_to_maltparser=_path_to_maltparser, model=_path_to_model)
>>> sent1 = 'I shot an elephant in my pajamas'.split()
>>> sent2 = 'Time flies like banana'.split()
>>> sentences = [sent1, sent2]
>>> for sent in sentences:
>>> ... print(mp.parse_one(sent).tree())
但是這不是我要找的解決方案。 我的問題是如何回答爲什麼parse_sent()
返回parse_one()
的迭代。以及它如何在NLTK代碼中被修復?
@NikitaAstrakhantsev回答後,我已經試過了,現在輸出語法樹,但它似乎混淆和分析之前,兩個句子付諸之一。
# Initialize a MaltParser object with a pre-trained model.
mp = MaltParser(path_to_maltparser=path_to_maltparser, model=path_to_model)
sent = 'I shot an elephant in my pajamas'.split()
sent2 = 'Time flies like banana'.split()
# Parse a single sentence.
print(mp.parse_one(sent).tree())
print(next(next(mp.parse_sents([sent,sent2]))).tree())
[出]:
(pajamas (shot I) an elephant in my)
(shot I (banana an elephant in my pajamas Time flies like))
從它似乎是在做一些奇怪的代碼:https://github.com/nltk/nltk/blob/develop/nltk/parse/api.py#L45
爲什麼在NLTK解析器抽象類嗖嗖兩句話成解析之前的一個?我是否錯誤地撥打parse_sents()
?如果是這樣,請撥打parse_sents()
的正確方法是什麼?
謝謝!現在它輸出樹,但它是錯誤的樹,請參閱更新的問題。 – alvas
我更新了我的答案,但沒有解決方案 - 只是備註 –
發現錯誤!我是sooo盲目https://github.com/alvations/nltk/blob/patch-1/nltk/parse/malt.py#L56'yield'\ n \ n''是在錯誤的縮進! gosh ... – alvas