如何獲得python中列表中10個最頻繁的字符串

我有一個包含93個不同字符串的列表。我需要找到10個最頻繁的字符串，並且返回必須從最頻繁到最不頻繁。如何獲得python中列表中10個最頻繁的字符串

mylist = ['"and', '"beware', '`twas', 'all', 'all', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'arms', 'as', 'as', 'awhile', 'back', 'bandersnatch', 'beamish', 'beware', 'bird', 'bite', 'blade', 'borogoves', 'borogoves', 'boy', 'brillig'] 
# this is just a sample of the actual list.

我沒有蟒蛇的最新版本，並且不能使用計數器。

來源

2012-04-11 Keely Aranyos

您可以使用collections module中的Counter來執行此操作。

from collections import Counter 
c = Counter(mylist)

然後做c.most_common(10)回報

[('and', 13), 
('all', 2), 
('as', 2), 
('borogoves', 2), 
('boy', 1), 
('blade', 1), 
('bandersnatch', 1), 
('beware', 1), 
('bite', 1), 
('arms', 1)]

來源

2012-04-11 04:04:23

接受此消息！ – 2012-04-11 04:05:46

就是這樣。沒有更多的在灌木叢中跳動。 – 2012-04-11 04:22:36

我沒有python的最新版本，也無法使用計數器 – 2012-04-11 04:41:45

不作爲問題的修改版本要求

改爲使用heap.nlargest使用Counter通過@Duncan

>>> from collections import defaultdict 
>>> from operator import itemgetter 
>>> from heapq import nlargest 
>>> mylist = ['"and', '"beware', '`twas', 'all', 'all', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'arms', 'as', 'as', 'awhile', 'back', 'bandersnatch', 'beamish', 'beware', 'bird', 'bite', 'blade', 'borogoves', 'borogoves', 'boy', 'brillig'] 
>>> c = defaultdict(int) 
>>> for item in mylist: 
     c[item] += 1 


>>> [word for word,freq in nlargest(10,c.iteritems(),key=itemgetter(1))] 
['and', 'all', 'as', 'borogoves', 'boy', 'blade', 'bandersnatch', 'beware', 'bite', 'arms']

來源

2012-04-11 04:05:43 jamylak

我沒有python的最新版本，也不能使用計數器 – 2012-04-11 04:45:29

你有'defaultdict'嗎？嘗試'從集合導入defaultdict'，如果是的話，我可以寫一個快速的解決方案。 – jamylak 2012-04-11 04:48:02

是的，我確實有 – 2012-04-11 04:49:14

大衛的建議答案是最好的如果你使用的Python版本不包含來自collections模塊的計數器（這是在Python 2.7中引入的），你可以使用計數器類的this implementation做同樣的事情。我懷疑它會比模塊慢，但會做同樣的事情。

來源

2012-04-11 04:54:27

計數器不包含在Python 2.4中，但在2.7。它在文檔中是這樣說的 - http://docs.python.org/library/collections.html#collections.Counter – 2012-04-11 04:59:20

是的，我已經更新了我的答案以反映正確的版本。但是，提供的解決方案在2.7之前工作。 – 2012-04-11 05:00:38

酷 - 這個片段是由Raymond Hettinger（作品中的很多東西的作者）編寫的，非常像2.7源代碼。很好的發現。 :) – 2012-04-11 05:07:00

大衛的解決方案是最好的。

但可能更多的樂趣比什麼，在這裏是不導入任何模塊的解決方案：

dicto = {} 

for ele in mylist: 
    try: 
     dicto[ele] += 1 
    except KeyError: 
     dicto[ele] = 1 

top_10 = sorted(dicto.iteritems(), key = lambda k: k[1], reverse = True)[:10]

結果：

>>> top_10 
[('and', 13), ('all', 2), ('as', 2), ('borogoves', 2), ('boy', 1), ('blade', 1), ('bandersnatch', 1), ('beware', 1), ('bite', 1), ('arms', 1)]

編輯：

回答跟進問題：

new_dicto = {} 

for val, key in zip(dicto.itervalues(), dicto.iterkeys()): 

    try: 
     new_dicto[val].append(key) 
    except KeyError: 
     new_dicto[val] = [key] 

alph_sorted = sorted([(key,sorted(val)) for key,val in zip(new_dicto.iterkeys(), new_dicto.itervalues())], reverse = True)

結果：

>>> alph_sorted 
[(13, ['and']), (2, ['all', 'as', 'borogoves']), (1, ['"and', '"beware', '`twas', 'arms', 'awhile', 'back', 'bandersnatch', 'beamish', 'beware', 'bird', 'bite', 'blade', 'boy', 'brillig'])]

，一旦出現按字母順序排序，如果你發現有些話對他們有多餘的引號的字。

編輯：

在回答另一個跟進的問題：

top_10 = [] 

for tup in alph_sorted: 
    for word in tup[1]: 
     top_10.append(word) 
     if len(top_10) == 10: 
      break

結果：

>>> top_10 
['and', 'all', 'as', 'borogoves', '"and', '"beware', '`twas', 'arms', 'awhile', 'back']

來源

2012-04-11 05:10:29 Akavall

那麼你怎麼才能夠得到的單詞和具有相同數字的單詞，你會如何按字母順序排列它們 – 2012-04-11 05:21:45

如何獲得alph排序的前10名 – 2012-04-12 03:25:01

@KeelyAranyos我編輯了我的張貼回答你的第二個問題，希望它能給你你正在尋找的東西。 – Akavall 2012-04-15 04:20:23

如果你的Python版本不支持計數器，你可以做櫃檯的實現方式

>>> import operator,collections,heapq 
>>> counter = collections.defaultdict(int) 
>>> for elem in mylist: 
    counter[elem]+=1   
>>> heapq.nlargest(10,counter.iteritems(),operator.itemgetter(1)) 
[('and', 13), ('all', 2), ('as', 2), ('borogoves', 2), ('boy', 1), ('blade', 1), ('bandersnatch', 1), ('beware', 1), ('bite', 1), ('arms', 1)]

如果您看到計數器類，它會創建一個字典，顯示出現在可重用的所有元素中然後它將數據放入heapq中，key是字典的值並檢索該字典的值

來源

2012-04-11 05:36:08 Abhijit

如何獲得python中列表中10個最頻繁的字符串

回答

相關問題