2014-01-07 67 views
4

我有這個代碼應該顯示根據定義的語法句子的句法結構。但是它返回一個空的[]。我錯過了什麼或做錯了什麼?Python和NLTK:如何分析語法語法?

import nltk 

grammar = nltk.parse_cfg(""" 
S -> NP VP 
PP -> P NP 
NP -> Det N | Det N PP 
VP -> V NP | VP PP 
N -> 'Kim' | 'Dana' | 'everyone' 
V -> 'arrived' | 'left' |'cheered' 
P -> 'or' | 'and' 
""") 

def main(): 
    sent = "Kim arrived or Dana left and everyone cheered".split() 
    parser = nltk.ChartParser(grammar) 
    trees = parser.nbest_parse(sent) 
    for tree in trees: 
     print tree 

if __name__ == '__main__': 
    main() 

回答

7

比較讓我們做一些逆向工程:

>>> import nltk 
>>> grammar = nltk.parse_cfg(""" 
... NP -> Det N | Det N PP 
... N -> 'Kim' | 'Dana' | 'everyone' 
... """) 
>>> sent = "Kim".split() 
>>> parser = nltk.ChartParser(grammar) 
>>> print parser.nbest_parse(sent) 
[] 

好像規則無法識別,甚至第一部作品爲NP。所以讓我們嘗試注入NP -> N

>>> import nltk 
>>> grammar = nltk.parse_cfg(""" 
... NP -> Det N | Det N PP | N 
... N -> 'Kim' | 'Dana' | 'everyone' 
... """) 
>>> sent = "Kim".split() 
>>> parser = nltk.ChartParser(grammar) 
>>> print parser.nbest_parse(sent) 
[Tree('NP', [Tree('N', ['Kim'])])] 

所以現在它的工作,讓我們繼續Kim arrived or Dana and

>>> import nltk 
>>> grammar = nltk.parse_cfg(""" 
... S -> NP VP 
... PP -> P NP 
... NP -> Det N | Det N PP | N 
... VP -> V NP | VP PP 
... N -> 'Kim' | 'Dana' | 'everyone' 
... V -> 'arrived' | 'left' |'cheered' 
... P -> 'or' | 'and' 
... """) 
>>> sent = "Kim arrived".split() 
>>> parser = nltk.ChartParser(grammar) 
>>> print parser.nbest_parse(sent) 
[] 
>>> 
>>> sent = "Kim arrived or".split() 
>>> parser = nltk.ChartParser(grammar) 
>>> print parser.nbest_parse(sent) 
[] 

似乎沒有辦法讓VP帶或不帶P,因爲V要求要麼一個NP之後,或者它必須在採取P之前在樹上成爲VP,因此它放寬了規則並且說VP -> V PP而不是VP -> VP PP

>>> import nltk 
>>> grammar = nltk.parse_cfg(""" 
... S -> NP VP 
... PP -> P NP 
... NP -> Det N | Det N PP | N 
... VP -> V NP | V PP 
... N -> 'Kim' | 'Dana' | 'everyone' 
... V -> 'arrived' | 'left' |'cheered' 
... P -> 'or' | 'and' 
... """) 
>>> sent = "Kim arrived or Dana".split() 
>>> parser = nltk.ChartParser(grammar) 
>>> print parser.nbest_parse(sent) 
[Tree('S', [Tree('NP', [Tree('N', ['Kim'])]), Tree('VP', [Tree('V', ['arrived']), Tree('PP', [Tree('P', ['or']), Tree('NP', [Tree('N', ['Dana'])])])])])] 

好了,我們的距離越來越近,但似乎是下一個單詞再次打破了CFG規則:

>> import nltk 
>>> grammar = nltk.parse_cfg(""" 
... S -> NP VP 
... PP -> P NP 
... NP -> Det N | Det N PP | N 
... VP -> V NP | V PP 
... N -> 'Kim' | 'Dana' | 'everyone' 
... V -> 'arrived' | 'left' |'cheered' 
... P -> 'or' | 'and' 
... """) 
>>> sent = "Kim arrived or Dana left".split() 
>>> parser = nltk.ChartParser(grammar) 
>>> print parser.nbest_parse(sent) 
[] 
>>> sent = "Kim arrived or Dana left and".split() 
>>> parser = nltk.ChartParser(grammar) 
>>> print parser.nbest_parse(sent) 
[] 
>>> 
>>> sent = "Kim arrived or Dana left and everyone".split() 
>>> parser = nltk.ChartParser(grammar) 
>>> print parser.nbest_parse(sent) 
[] 
>>> 
>>> sent = "Kim arrived or Dana left and everyone cheered".split() 
>>> parser = nltk.ChartParser(grammar) 
>>> print parser.nbest_parse(sent) 
[] 

所以我希望上面的例子說明你是試圖改變規則,結合語言從左到右的現象很難。

而是從左至右做的,實現

[[[[[[[[Kim] arrived] or] Dana] left] and] everyone] cheered] 

你爲什麼不嘗試讓更多的語言聲音規則來實現:

  1. [[[Kim arrived] or [Dana left]] and [everyone cheered]]
  2. [[Kim arrived] or [[Dana left] and [everyone cheered]]]

試試這個:

import nltk 
grammar = nltk.parse_cfg(""" 
S -> CP | VP 
CP -> VP C VP | CP C VP | VP C CP 
VP -> NP V 
NP -> 'Kim' | 'Dana' | 'everyone' 
V -> 'arrived' | 'left' |'cheered' 
C -> 'or' | 'and' 
""") 

print "======= Kim arrived =========" 
sent = "Kim arrived".split() 
parser = nltk.ChartParser(grammar) 
for t in parser.nbest_parse(sent): 
    print t 

print "\n======= Kim arrived or Dana left =========" 
sent = "Kim arrived or Dana left".split() 
parser = nltk.ChartParser(grammar) 
for t in parser.nbest_parse(sent): 
    print t 

print "\n=== Kim arrived or Dana left and everyone cheered ====" 
sent = "Kim arrived or Dana left and everyone cheered".split() 
parser = nltk.ChartParser(grammar) 
for t in parser.nbest_parse(sent): 
    print t 

[out]

======= Kim arrived ========= 
(S (VP (NP Kim) (V arrived))) 

======= Kim arrived or Dana left ========= 
(S (CP (VP (NP Kim) (V arrived)) (C or) (VP (NP Dana) (V left)))) 

=== Kim arrived or Dana left and everyone cheered ==== 
(S 
    (CP 
    (CP (VP (NP Kim) (V arrived)) (C or) (VP (NP Dana) (V left))) 
    (C and) 
    (VP (NP everyone) (V cheered)))) 
(S 
    (CP 
    (VP (NP Kim) (V arrived)) 
    (C or) 
    (CP 
     (VP (NP Dana) (V left)) 
     (C and) 
     (VP (NP everyone) (V cheered))))) 

上述解決方案展示你的CFG規則需要如何強大到足以不僅捕捉到完整的句子,而且句子的一部分了。

+0

我讀過的最好,最完整的答案之一 – jbnunn

5

你不必在你的語法定義的Det,但每個NP(因此S)必須有一個由語法定義。

>>> grammar = nltk.parse_cfg(""" 
... S -> NP VP 
... NP -> Det N | Det N PP 
... VP -> V NP | VP PP 
... Det -> 'a' | 'the' 
... N -> 'Kim' | 'Dana' | 'everyone' 
... V -> 'arrived' | 'left' |'cheered' 
... """) 
>>> 
>>> parser = nltk.ChartParser(grammar) 
>>> parser.nbest_parse('the Kim left a Dana'.split()) 
[Tree('S', [Tree('NP', [Tree('Det', ['the']), Tree('N', ['Kim'])]), Tree('VP', [Tree('V', ['left']), Tree('NP', [Tree('Det', ['a']), Tree('N', ['Dana'])])])])]