使用collections.Counter來計數子列表

我有符號化的文本句子的列表（YouTube評論）元素：使用collections.Counter來計數子列表

sample_tok = [['How', 'does', 'it', 'call', 'them', '?', '\xef\xbb\xbf'], 
       ['Thats', 'smart\xef\xbb\xbf'], 
       ... # and sooo on..... 
       ['1:45', ':', 'O', '\xef\xbb\xbf']]

現在我想和單詞的字典，他們提到的時間量。

from collections import Counter 

d = Counter() 
for sent in [sample_tok]: 
    for words in sent: 
     d = Counter(words)

不幸的是，這只是計數的最後子表...

[(':', 1), ('1:45', 1), ('\xef\xbb\xbf', 1), ('O', 1)]

有沒有辦法讓它統計所有標記化的句子？

來源

2014-10-20 Marshall

您更換你的櫃檯，不更新它。每次在循環中產生一個新的Counter()實例，放棄以前的副本。

在嵌套發電機表達每個單詞傳遞到您的Counter()：

d = Counter(word for sublist in sample_tok for word in sublist)

，或者，如果你需要先莫名其妙地處理每個子表，使用Counter.update()：

d = Counter() 
for sent in [sample_tok]: 
    for words in sent: 
     d.update(words)

來源

2014-10-20 11:38:37

您可以使用Counter實例的update方法。這將計數傳遞的值並將它們添加到計數器。

d = Counter() 
for sent in [sample_tok]: 
    for words in sent: 
     d.update(words)

或者你可以將新計數器添加到老：

d = Counter() 
for sent in [sample_tok]: 
    for words in sent: 
     d += Counter(words)

來源

2014-10-20 11:38:20 parchment

使用collections.Counter來計數子列表

回答

相關問題