Python - 計算列表中某些範圍的出現次數

所以基本上我想要計算一個浮點出現在給定列表中的出現次數。例如：用戶輸入等級列表（所有分數均爲100），並按十個組排列。從0-10,10-20,20-30 ..等等的分數出現多少次？像測試分數分佈一樣。我知道我可以使用計數功能，但由於我沒有找到具體的數字，我遇到了麻煩。有沒有把數量和範圍結合起來？謝謝你的幫助。Python - 計算列表中某些範圍的出現次數

來源

2012-03-03 user1246457

decs = [int(x/10) for x in scores]

地圖得分從0-9→0,10-19→1等等。然後只計算0,1,2,3等的出現次數（通過諸如collections.Counter之類的東西），然後映射回那裏的範圍。

來源

2012-03-03 06:07:32 Amber

或'[X // 10對於x分數]' – 2012-03-03 07:50:09

技術上有一點區別 - 'x // 10'將生成'float'結果，而'int（x/10）'將生成一個'int'。 – Amber 2012-03-03 07:51:39

當然，@RaymondHettinger - 但我沒有提到有關'list.count'的任何信息，所以我不確定你爲什麼要提到這個。 – Amber 2012-03-03 09:05:15

>>> def count_scores(scores, low, high): 
... return len([x for x in scores if x >= low and x <= high]) 
... 
>>> import random 
>>> scores = [random.randint(0,100) for _ in xrange(20)] 
>>> print scores 
[80, 92, 96, 33, 45, 62, 76, 74, 90, 80, 82, 99, 72, 60, 61, 29, 13, 2, 63, 87] 
>>> d = {'scores_{}-{}'.format(x, x + 10): count_scores(scores, x, x + 10) for x in xrange(0, 100, 10)} 
>>> for k,v in sorted(d.iteritems()): 
... print '{}: {}'.format(k, v) 
... 
scores_0-10: 1 
scores_10-20: 1 
scores_20-30: 1 
scores_30-40: 1 
scores_40-50: 1 
scores_50-60: 1 
scores_60-70: 4 
scores_70-80: 5 
scores_80-90: 5 
scores_90-100: 4

來源

2012-03-03 06:08:08 wim

這將計算單個時間間隔中的元素。 OP對多個itervals感興趣。列表，字典或計數器都可用於一次累積所有計數。 – 2012-03-03 08:17:00

此方法很容易擴展到多個間隔（請參閱編輯）。在一次通過中累計計數是過早的優化，對於像計算班級成績數量這樣的應用程序來說，這不是必需的。 – wim 2012-03-03 08:51:50

此方法使用平分法，它可以更高效，但它要求您首先對分數進行排序。

from bisect import bisect 
import random 

scores = [random.randint(0,100) for _ in xrange(100)] 
bins = [20, 40, 60, 80, 100] 

scores.sort() 
counts = [] 
last = 0 
for range_max in bins: 
    i = bisect(scores, range_max, last) 
    counts.append(i - last) 
    last = i

我不指望你安裝numpy的只是這一點，但如果你已經有numpy的，你可以使用numpy.histogram。

UPDATE

首先，使用平分是更靈活的。使用[i//n for i in scores]要求所有的垃圾箱大小相同。使用等分線可以使垃圾箱具有任意限制。 i//n也意味着範圍是[lo，hi]。使用平分範圍是（lo，hi），但如果你想[lo，hi），你可以使用bisect_left。

第二等分更快，見下面的時間。我用更慢的排序（分數）替換scores.sort（），因爲排序是最慢的一步，我不想用預先排序的數組偏好時間，但OP說他/她的數組已經存在在這種情況下排序如此平分可能會更有意義。

setup=""" 
from bisect import bisect_left 
import random 
from collections import Counter 

def histogram(iterable, low, high, bins): 
    step = (high - low)/bins 
    dist = Counter(((x - low + 0.) // step for x in iterable)) 
    return [dist[b] for b in xrange(bins)] 

def histogram_bisect(scores, groups): 
    scores = sorted(scores) 
    counts = [] 
    last = 0 
    for range_max in groups: 
     i = bisect_left(scores, range_max, last) 
     counts.append(i - last) 
     last = i 
    return counts 

def histogram_simple(scores, bin_size): 
    scores = [i//bin_size for i in scores] 
    return [scores.count(i) for i in range(max(scores)+1)] 

scores = [random.randint(0,100) for _ in xrange(100)] 
bins = range(10, 101, 10) 
""" 
from timeit import repeat 
t = repeat('C = histogram(scores, 0, 100, 10)', setup=setup, number=10000) 
print min(t) 
#.95 
t = repeat('C = histogram_bisect(scores, bins)', setup=setup, number=10000) 
print min(t) 
#.22 
t = repeat('histogram_simple(scores, 10)', setup=setup, number=10000) 
print min(t) 
#.36

來源

2012-03-03 07:48:26

我同意@amber。使用* bisect *在這裏是一種浪費，因爲您可以對均勻間隔的垃圾箱使用簡單的劃分。 – 2012-03-03 08:06:07

我認爲@RaymondHettinger和我正在考慮與你在這裏做的事情不同的'bisect'的用法（即用二分法找出一個單獨的分數進入哪一個分箱，這將是一種浪費）。對於大量的分數，你是對的，平分是有潛力的。 – Amber 2012-03-03 20:19:12

要對數據進行分組，請將其除以區間寬度。要計算每組中的數字，請考慮使用collections.Counter。這裏的文檔和測試一個摸索出例如：

from collections import Counter 

def histogram(iterable, low, high, bins): 
    '''Count elements from the iterable into evenly spaced bins 

     >>> scores = [82, 85, 90, 91, 70, 87, 45] 
     >>> histogram(scores, 0, 100, 10) 
     [0, 0, 0, 0, 1, 0, 0, 1, 3, 2] 

    ''' 
    step = (high - low + 0.0)/bins 
    dist = Counter((float(x) - low) // step for x in iterable) 
    return [dist[b] for b in range(bins)] 

if __name__ == '__main__': 
    import doctest 
    print doctest.testmod()

來源

2012-03-03 07:59:33

我看起來像你在你之前寫下的所有四個答案。如果是這樣的話，那麼我會說，儘管你的答案可能是最好的，但也許不是所有其他人都應該得到這個滿意的答案，因爲畢竟它們可能是有用的（即使它們都不是最好的）。 – jcollado 2012-03-03 08:45:59

感謝您的回覆。我試圖實施你的解決方案，我遇到了數學運算符的問題。例如，step和dist變量不會在字符串上工作。對不起，如果我聽起來像一個總的noob（這是因爲我），但是有沒有辦法強制分數進入列表？我輸入的是11,48,13,9,4。這是一個默認的字符串？ – user1246457 2012-03-03 19:35:55

很抱歉有多條評論。這是我正在使用的：'from collections import Counter def gradeDistribution（examScores，low，high，bin）： step =（high-int（low）+ 0.0）/ bin dist = Counter（（x - int （低））//在考試科目中爲x的步數） return [dist [b] for b in range（bins）] examScores = [raw_input（「請輸入分數」）] gradeDistribution（examScores，0， 10）' 我收到的錯誤消息是： TypeError：不支持的操作數類型爲 - ：'str'和'int'感謝任何人都可以提供的洞察力。 – user1246457 2012-03-03 19:55:12

如果你細使用外部庫NumPy的，那麼你只需要調用numpy.histogram()：

>>> data = [82, 85, 90, 91, 70, 87, 45] 
>>> counts, bins = numpy.histogram(data, bins=10, range=(0, 100)) 
>>> counts 
array([0, 0, 0, 0, 1, 0, 0, 1, 3, 2]) 
>>> bins 
array([ 0., 10., 20., 30., 40., 50., 60., 70., 80., 
     90., 100.])

來源

2012-03-03 14:46:19

Python - 計算列表中某些範圍的出現次數

回答

相關問題