2015-04-30 67 views
0

下面的Python函數用於計算具有大小相等的數據的數據直方圖。我想獲得正確的結果Python-計算一組數據的直方圖

[1, 6, 4, 6] 

但是之後我運行代碼,得到它導致

[7, 12, 17, 17] 

這是不正確的。任何人都可以知道如何解決它?

# Computes the histogram of a set of data 
def histogram(data, num_bins): 

# Find what range the data spans, and use it to calculate the bin size. 
span = max(data) - min(data) 
bin_size = span/num_bins 

# Calculate the thresholds for each bin. 
thresholds = [0] * num_bins 
for i in range(num_bins): 
    thresholds[i] += bin_size * (i+1) 

# Compute the histogram 
counts = [0] * num_bins 
for datum in data: 
    # Increment the count of the bin that the datum falls in 
    for bin_index, threshold in enumerate(thresholds): 
     if datum <= threshold: 
      counts[bin_index] += 1 
return counts 

# Some random data 
data = [-3.2, 0, 1, 1.5, 1.6, 1.9, 5, 6, 9, 1, 4, 5, 8, 9, 5, 6.7, 9] 
print("Correct result:\t" + str([1, 6, 4, 6])) 
print("Your result:\t" + str(histogram(data, num_bins=4))) 
+0

你認爲是什麼使得它不正確的? – miradulo

+0

你的代碼是無效的Python。請[編輯]它並修復注意事項。 –

+0

@Tichodroma:感謝您的編輯。 – user21

回答

3

只有你有兩個邏輯錯誤

(1)計算閾

(2)在添加破,一旦發現範圍

def histogram(data, num_bins): 
    span = max(data) - min(data) 
    bin_size = float(span)/num_bins 
    thresholds = [0] * num_bins 

    for i in range(num_bins): 
    #I change thresholds calc 
    thresholds[i] = min(data) + bin_size * (i+1) 

    counts = [0] * num_bins 
    for datum in data: 
    for bin_index, threshold in enumerate(thresholds): 
     if datum <= threshold: 
     counts[bin_index] += 1 
     #I add a break 
     break 
    return counts 

data = [-3.2, 0, 1, 1.5, 1.6, 1.9, 5, 6, 9, 1, 4, 5, 8, 9, 5, 6.7, 9] 
print("Correct result:\t" + str([1, 6, 4, 6])) 
print("Your result:\t" + str(histogram(data, num_bins=4))) 
5

,如果你想找到直方圖使用numpy的

import numpy as np 
np.histogram([-3.2, 0, 1, 1.5, 1.6, 1.9, 5, 6, 9, 1, 4, 5, 8, 9, 5, 6.7, 9],4) 
0

檢查閾值定義和if語句。 這個工程:

所有的
def histogram(data, num_bins): 

    # Find what range the data spans, and use it to calculate the bin size. 
    span = max(data) - min(data) 
    bin_size = span/float(num_bins) 

    # Calculate the thresholds for each bin. 
    thresholds = [0 for i in range(num_bins+1)] 
    for i in range(num_bins): 
     thresholds[i] += bin_size * (i) 

    print thresholds 
    # Compute the histogram 
    counts = [0 for i in range(num_bins)] 
    for datum in data: 
     # Increment the count of the bin that the datum falls in 
     for bin_index, threshold in enumerate(thresholds): 
      if thresholds[bin_index-1] <= datum <= threshold: 
       counts[bin_index] += 1 
    return counts 
+0

它說編譯錯誤 - http://ideone.com/tlyZ4B – user21

1

首先,如果只是想直方圖數據,numpy的提供這一點。但是,你問自己如何做到這一點。你的代碼表明你忘記了你想要做的事情,所以把你的功能分解成更小的功能。例如,要計算閾值,請編寫函數thresholds(xmin, xmax, nbins),或更好地使用numpy.linspace。如果您認爲相對於0(而不是min(data))遞增,並且如果幸運的話,可能會提醒您不要指望精確的浮點累加,這會引起您注意出現的問題。所以你可能最終與

def thresholds(xmin, xmax, nbins): 
    span = (xmax - xmin)/float(nbins) 
    thresholds = [xmin + (i+1)*span for i in range(nbins)] 
    thresholds[-1] = xmax 
    return thresholds 

接下來,你需要得到bin計數。再次,您可以使用​​。與你的代碼相比,重要的是不增加一個以上的bin。最後,你可能會得到這樣的

def counts(data, bounds): 
    counts = [0] * len(bounds) 
    for datum in data: 
     bin = min(i for i,bound in enumerate(bounds) if bound >= datum) 
     counts[bin] += 1 
    return counts 

現在你已經準備好了:

def histogram02(data, num_bins): 
    xmin = min(data) 
    xmax = max(data) 
    th = thresholds(xmin, xmax, num_bins) 
    return counts(data, th)