列表中最常見的Python元素

105

什麼是在Python列表中查找最常見元素的有效方法？列表中最常見的Python元素

我的列表項可能不可排列，因此不能使用字典。同樣在繪製的情況下，應返回索引最低的項目。例如：

>>> most_common(['duck', 'duck', 'goose']) 
'duck' 
>>> most_common(['goose', 'duck', 'duck', 'goose']) 
'goose'

來源

2009-10-05 hoju

如果列表中的項目不可散列，那麼如何確定它們何時「平等」？在確定非可哈希項目的等式時效率的損失可能會否定任何效率，你希望用一個好的算法獲得:) – 2009-10-05 07:05:23

我認爲他意味着這些項目可以是可變的，因此不可能成爲哈希映射中的鍵。 – fortran 2009-10-05 07:35:17

是的，這就是我的意思 - 有時它會包含列表 – hoju 2009-10-05 12:02:31

隨着提議，我很驚訝，沒有人提出這麼多的解決方案，我會考慮一個明顯的例子（用於非可排除但可比的元素） - [itertools.groupby] [1]。 itertools提供快速，可重用的功能，並讓您將一些棘手的邏輯委託給經過良好測試的標準庫組件。考慮例如：

import itertools 
import operator 

def most_common(L): 
    # get an iterable of (item, iterable) pairs 
    SL = sorted((x, i) for i, x in enumerate(L)) 
    # print 'SL:', SL 
    groups = itertools.groupby(SL, key=operator.itemgetter(0)) 
    # auxiliary function to get "quality" for an item 
    def _auxfun(g): 
    item, iterable = g 
    count = 0 
    min_index = len(L) 
    for _, where in iterable: 
     count += 1 
     min_index = min(min_index, where) 
    # print 'item %r, count %r, minind %r' % (item, count, min_index) 
    return count, -min_index 
    # pick the highest-count/earliest item 
    return max(groups, key=_auxfun)[0]

這當然可以寫得更簡潔，但我的目標是最大限度地清晰。兩個print聲明可以不註釋以更好地看到機器的行動;例如，與打印未註釋：

print most_common(['goose', 'duck', 'duck', 'goose'])

發射：

SL: [('duck', 1), ('duck', 2), ('goose', 0), ('goose', 3)] 
item 'duck', count 2, minind 1 
item 'goose', count 2, minind 0 
goose

正如所看到的，SL是對列表，每個對一個項，隨後由該項目的原始列表索引（要實現的關鍵條件是，如果具有相同最高計數的「最常見」項目> 1，則結果必須是最早出現的項目）。

groupby組由唯一的項目（通過operator.itemgetter）。輔助功能，max計算期間稱爲每分組一次，接收並解包在內部的基團 - 兩個項目(item, iterable)其中可迭代的項目也有兩個項的元組，(item, original index) [[的SL]的項目]的元組。

然後輔助功能使用一個循環，以確定該組的可迭代，和最小原始索引條目的兩個計數;它會將它們作爲組合「質量關鍵點」返回，並且最小索引符號發生變化，因此max操作會將原來列表中較早出現的那些項目視爲「更好」。

此代碼可能是更簡單的，如果它在時間和空間，如擔心一個小減少約大O問題...：

def most_common(L): 
    groups = itertools.groupby(sorted(L)) 
    def _auxfun((item, iterable)): 
    return len(list(iterable)), -L.index(item) 
    return max(groups, key=_auxfun)[0]

相同的基本想法，只是表示更簡單和緊湊...但是，唉，一個額外的O（N）輔助空間（用於將組的可迭代列表包含到列表中）和O（N平方）時間（以獲得每個項目的L.index）。雖然過早的優化是一切罪惡的根源，編程，故意挑選一個O（N的平方）當O（N日誌N）的方法之一是使用只是去太多對可擴展性的糧食 - ！）

最後，對於那些喜歡「清晰度和性能」的人來說，獎金1班輪版本適當地修改了名字:-)。

from itertools import groupby as g 
def most_common_oneliner(L): 
    return max(g(sorted(L)), key=lambda(x, v):(len(list(v)),-L.index(x)))[0]

來源

2009-10-05 15:16:29

如果你的列表有不同的類型，這會在Python3上中斷。 – AlexLordThorsen 2016-02-24 22:47:19

'groupby'需要先排序（O（NlogN））;使用具有'most_common（）'的'Counter（）'可以勝過它，因爲它使用heapq來查找最高頻率的項目（只有1項，即O（N）時間）。由於Counter（）現在被大量優化（計數發生在C循環中），即使對於小列表，它也可以輕鬆擊敗該解決方案。它將大量清單從水中吹出。 – 2017-10-14 21:26:07

只有「最低指數」的關係要求才能成爲解決這個問題的有效解決方案。對於更一般的情況，你絕對應該使用Counter方法。 – 2017-10-14 22:11:01

如果他們不哈希的，可以對它們進行排序，並做一個遍歷結果計數的項目（同一項目將彼此相鄰）。但是使它們可以被哈希和使用字典可能會更快。

def most_common(lst): 
    cur_length = 0 
    max_length = 0 
    cur_i = 0 
    max_i = 0 
    cur_item = None 
    max_item = None 
    for i, item in sorted(enumerate(lst), key=lambda x: x[1]): 
     if cur_item is None or cur_item != item: 
      if cur_length > max_length or (cur_length == max_length and cur_i < max_i): 
       max_length = cur_length 
       max_i = cur_i 
       max_item = cur_item 
      cur_length = 1 
      cur_i = i 
      cur_item = item 
     else: 
      cur_length += 1 
    if cur_length > max_length or (cur_length == max_length and cur_i < max_i): 
     return cur_item 
    return max_item

來源

2009-10-05 06:39:56

這是一個更簡單的方法http://ideone.com/Nq81vf，比較亞歷克斯的'Counter（）'解決方案 – Miguel 2017-01-27 14:49:38

排序列表的副本，發現最長的運行。您可以在使用每個元素的索引對其進行排序之前修飾列表，然後選擇以平局爲單位從最低索引開始的運行。

來源

2009-10-05 06:40:21 Boojum

這些項目可能無法比較。 – 2013-06-29 12:06:32

這是明顯慢溶液（爲O（n^2）），如果沒有排序，也不散列是可行的，但相等的比較（==）可用：

def most_common(items): 
    if not items: 
    raise ValueError 
    fitems = [] 
    best_idx = 0 
    for item in items: 
    item_missing = True 
    i = 0 
    for fitem in fitems: 
     if fitem[0] == item: 
     fitem[1] += 1 
     d = fitem[1] - fitems[best_idx][1] 
     if d > 0 or (d == 0 and fitems[best_idx][2] > fitem[2]): 
      best_idx = i 
     item_missing = False 
     break 
     i += 1 
    if item_missing: 
     fitems.append([item, 1, i]) 
    return items[best_idx]

但讓您的項目可哈希或排序（正如其他答案所建議的）如果列表（n）的長度很大，幾乎總是能夠更快找到最常見的元素。 O（n）平均具有散列，O（n * log（n））最差排序。

來源

2009-10-05 06:46:53 pts

對於downvoter：這個答案有什麼問題？當排序和散列都不可行時，其他答案是否提供瞭解決方案？ – pts 2018-03-09 23:33:16

-1

這裏：

def most_common(l): 
    max = 0 
    maxitem = None 
    for x in set(l): 
     count = l.count(x) 
     if count > max: 
      max = count 
      maxitem = x 
    return maxitem

我有一個模糊的感覺存在於某處，這將使你的每個元素的計數標準庫的方法，但我不能找到它。

來源

2009-10-05 06:56:18

'最大'是一種方法。你會改變變量的名字嗎？ – 2009-10-05 07:04:27

請注意，set（）也需要可哈希的項目，否則在這種情況下解決方案將無法工作。 – 2009-10-05 07:04:44

等一下，我錯過了那個不可排除的部分。但是如果物體具有平等性，應該很容易使它們變得易碎。 – 2009-10-05 08:40:33

-1

>>> li = ['goose', 'duck', 'duck'] 

>>> def foo(li): 
     st = set(li) 
     mx = -1 
     for each in st: 
      temp = li.count(each): 
      if mx < temp: 
       mx = temp 
       h = each 
     return h 

>>> foo(li) 
'duck'

來源

2009-10-05 06:56:50

當n很大並且唯一元素的數量也很大時，這具有可怕的性能特徵：O（n）用於轉換爲集合並且O（m * n）= O（n^2）用於計數（其中m是唯一身份的數量）。排序和步行分別爲O（n log n）和0（n）步行。 – jmucchiello 2009-10-05 07:12:28

是的，你是對的。現在我知道這是一個可怕的解決方案，爲什麼。感謝評論！ :-) – 2009-10-05 07:22:15

一個班輪：

def most_common (lst): 
    return max(((item, lst.count(item)) for item in set(lst)), key=lambda a: a[1])[0]

來源

2009-10-05 07:04:24 willurd

327

一個更簡單的一行：

def most_common(lst): 
    return max(set(lst), key=lst.count)

來源

2009-10-05 07:14:52 newacct

+15

執行委員會表示，* [..]如果提取的是索引最低的項目，則應退還。*該代碼一般不符合該要求。 – Stephan202 2009-10-05 07:45:14

另外，OP聲明元素必須是可散列的：集合必須包含可哈希的對象。 – EOL 2009-10-05 09:16:42

另外，這種方法在算法上很慢（對於'set（lst）'中的每個元素，必須再次檢查整個列表）...對於大多數用途來說可能足夠快，但是... – EOL 2009-10-05 09:17:27

# use Decorate, Sort, Undecorate to solve the problem 

def most_common(iterable): 
    # Make a list with tuples: (item, index) 
    # The index will be used later to break ties for most common item. 
    lst = [(x, i) for i, x in enumerate(iterable)] 
    lst.sort() 

    # lst_final will also be a list of tuples: (count, index, item) 
    # Sorting on this list will find us the most common item, and the index 
    # will break ties so the one listed first wins. Count is negative so 
    # largest count will have lowest value and sort first. 
    lst_final = [] 

    # Get an iterator for our new list... 
    itr = iter(lst) 

    # ...and pop the first tuple off. Setup current state vars for loop. 
    count = 1 
    tup = next(itr) 
    x_cur, i_cur = tup 

    # Loop over sorted list of tuples, counting occurrences of item. 
    for tup in itr: 
     # Same item again? 
     if x_cur == tup[0]: 
      # Yes, same item; increment count 
      count += 1 
     else: 
      # No, new item, so write previous current item to lst_final... 
      t = (-count, i_cur, x_cur) 
      lst_final.append(t) 
      # ...and reset current state vars for loop. 
      x_cur, i_cur = tup 
      count = 1 

    # Write final item after loop ends 
    t = (-count, i_cur, x_cur) 
    lst_final.append(t) 

    lst_final.sort() 
    answer = lst_final[0][2] 

    return answer 

print most_common(['x', 'e', 'a', 'e', 'a', 'e', 'e']) # prints 'e' 
print most_common(['goose', 'duck', 'duck', 'goose']) # prints 'goose'

來源

2009-10-05 08:02:50 steveha

這是一個爲O（n）的解決方案。

mydict = {} 
cnt, itm = 0, '' 
for item in reversed(lst): 
    mydict[item] = mydict.get(item, 0) + 1 
    if mydict[item] >= cnt : 
     cnt, itm = mydict[item], item 

print itm

（逆轉的使用，以確保它返回最低的索引項）

來源

2009-10-05 10:29:02 ThisIsMeMoony

您可能不需要這個了，但這是我爲類似問題所做的。（它看起來長於這是因爲評論。）

itemList = ['hi', 'hi', 'hello', 'bye'] 

counter = {} 
maxItemCount = 0 
for item in itemList: 
    try: 
     # Referencing this will cause a KeyError exception 
     # if it doesn't already exist 
     counter[item] 
     # ... meaning if we get this far it didn't happen so 
     # we'll increment 
     counter[item] += 1 
    except KeyError: 
     # If we got a KeyError we need to create the 
     # dictionary key 
     counter[item] = 1 

    # Keep overwriting maxItemCount with the latest number, 
    # if it's higher than the existing itemCount 
    if counter[item] > maxItemCount: 
     maxItemCount = counter[item] 
     mostPopularItem = item 

print mostPopularItem

來源

2010-04-14 00:35:16

你可以使用計數器[item] = counter.get（item，0）+ 1來代替try/except部分 – XueYu 2016-09-28 00:02:57

105

借用here，這可以使用Python 2.7使用：比Alex的解決方案快

from collections import Counter 

def Most_Common(lst): 
    data = Counter(lst) 
    return data.most_common(1)[0][0]

作品約4-6倍，比newacct提出的單線快50倍。

要檢索第一次出現在列表中關係的情況下，元素：

def most_common(lst): 
    data = Counter(lst) 
    return max(lst, key=data.get)

來源

2014-01-01 20:10:48 Alex

這可能對某些人有用，但是......不幸的是Counter是一個字典子類，OP說他不能不使用字典（因爲項目可能不可散列）。 – Danimal 2014-09-08 15:32:41

喜歡這個。上面的@newacct單線程可能很簡單，但它運行在O（n^2）;也就是說，其中n是列表的長度。這個解決方案是O（n）。 – BoltzmannBrain 2015-05-22 16:50:42

就像簡單和速度一樣......對OP來說可能並不理想。但很適合我！ – Thom 2015-10-20 12:50:34

要在統計數據模式是已知的，當然Python有一個內置函數來完成這一功能是什麼爲您提供：

>>> from statistics import mode 
>>> mode([1, 2, 2, 3, 3, 3, 3, 3, 4, 5, 6, 6, 6]) 
3

來源

2016-04-07 13:43:14

這並不滿足什麼返回時，有不止一個最常見的值OP的要求 - 一個statistics.StatisticsError提高 – 2016-04-07 14:06:31

哎呀，閱讀時錯過的要求。儘管如此，我仍然認爲這個答案是有價值的，因爲沒有人在這個問題上提出過這個答案，對於那些限制要求最少的人來說這是一個很好的解決方案。這是「列表python中最常見的項目」的最佳結果之一 – 2016-04-07 17:15:52

在這種情況下，請使用pandas DataFrames中的mode函數。 – Elmex80s 2017-03-13 22:34:11

-3

def popular(L): 
C={} 
for a in L: 
    C[a]=L.count(a) 
for b in C.keys(): 
    if C[b]==max(C.values()): 
     return b 
L=[2,3,5,3,6,3,6,3,6,3,7,467,4,7,4] 
print popular(L)

來源

2016-07-18 17:15:15 Pronoy

-2

def most_common(lst): 
    if max([lst.count(i)for i in lst]) == 1: 
     return False 
    else: 
     return max(set(lst), key=lst.count)

來源

2017-02-03 15:48:08 Ecanales

請提供一些關於您的代碼的信息，只是發佈代碼不是一個完整的答案 – jhhoff02 2017-02-03 16:09:16

有沒有人有理由在其他15個答案中使用它？ – cpburnz 2017-02-03 20:43:51

-1

我需要做這在最近的節目。我承認，我無法理解亞歷克斯的答案，所以這就是我最終的結果。

def mostPopular(l): 
    mpEl=None 
    mpIndex=0 
    mpCount=0 
    curEl=None 
    curCount=0 
    for i, el in sorted(enumerate(l), key=lambda x: (x[1], x[0]), reverse=True): 
     curCount=curCount+1 if el==curEl else 1 
     curEl=el 
     if curCount>mpCount \ 
     or (curCount==mpCount and i<mpIndex): 
      mpEl=curEl 
      mpIndex=i 
      mpCount=curCount 
    return mpEl, mpCount, mpIndex

我計時靠在Alex的解決方案，它是短名單快約10％-15％，但一旦你去了100元以上（經測試可達200000）是慢約20％。

來源

2017-03-13 17:22:30 user3596355

列表中最常見的Python元素

回答

相關問題