2014-09-21 77 views
0

對不起這個問題,但我瘋狂的驅動器錯誤「太多的值解壓縮」。這是密碼Python列表理解「太多值解包」

FREQ = 3 
fourgrams="" 
n = 4 
tokens = token_text(text) # is a function that tokenize 
fourgrams = ngrams(tokens, n) 
final_list = [(item,v) for item,v in nltk.FreqDist(fourgrams) if v > FREQ] 
print final_list 

錯誤在哪裏?非常感謝

+2

請張貼滿追蹤,它會告訴你究竟在哪裏引發異常。 – 2014-09-21 08:54:07

回答

2

FreqDist是一個類似字典的對象。迭代它會產生鍵(而不是鍵 - 值對)。如果你想重複這兩個鍵值對,使用FreqDist.itemsFreqDist.iteritems

final_list = [(item,v) for item,v in nltk.FreqDist(fourgrams).items() if v > FREQ] 
+0

它的工作原理!謝謝 – RoverDar 2014-09-21 08:58:01

+0

@RoverDar,不客氣。順便說一句,正如Burhan Khalid評論的那樣,在問題中包含完整的回溯會很好。 – falsetru 2014-09-21 09:00:07

1

在此請看:

from collections import Counter 

from nltk.corpus import brown 
from nltk.util import ngrams 

# Let's take the first 10000 words from the brown corpus 
text = brown.words()[:10000] 
# Extract the ngrams 
bigrams = ngrams(text, 2) 
# Alternatively, unstead of a FreqDist, you can simply use collections.Counter 
freqdist = Counter(bigrams) 
print len(freqdist) 
# Gets the top 5 ngrams 
top5 = freqdist.most_common()[:5] 
print top5 
# Limits v > 10 
freqdist = {k:v for k,v in freqdist.iteritems() if v > 10} 
print len(freqdist) 

[出]:

7615 
[(('of', 'the'), 95), (('.', 'The'), 76), (('in', 'the'), 59), (("''", '.'), 40), ((',', 'the'), 36)] 
34