訪問元素的n-gram

我以一個字符串，令牌化它，想看看最常見的雙字母組，這裏是我有：訪問元素的n-gram

import nltk 
import collections 
from nltk import ngrams 

someString="this is some text. this is some more test. this is even more text." 
tokens=nltk.word_tokenize(someString) 
tokens=[token.lower() for token in tokens if len()>1] 

bigram=ngrams(tokens,2) 
aCounter=collections.Counter(bigram)

如果我：

print(aCounter)

然後它會以排序順序輸出bigrams。

for element in aCounter: 
    print(element)

將打印元素，但不打印數量，也不打印數量。我想做一個for循環，在那裏我打印出文本中的X最常見的bigrams。

我基本上試圖同時學習Python和nltk，所以這可能是爲什麼我在這裏掙扎（我認爲這是一件微不足道的事情）。

來源

2016-09-22 basil

您可能正在尋找已存在的東西，即計數器上的most_common方法。從文檔：

返回最常見元素及其計數從最常見到最不重要的n列表。如果省略n或None，則most_common()返回計數器中的所有元素。以同樣罪名元素任意訂製：

你可以調用它，以獲得n最常見的值數對提供一個值n。例如：

from collections import Counter 

# initialize with silly value. 
c = Counter('aabbbccccdddeeeeefffffffghhhhiiiiiii') 

# Print 4 most common values and their respective count. 
for val, count in c.most_common(4): 
    print("Value {0} -> Count {1}".format(val, count))

打印出：

Value f -> Count 7 
Value i -> Count 7 
Value e -> Count 5 
Value h -> Count 4

來源

2016-09-22 23:34:18

訪問元素的n-gram

回答

相關問題