Python-計算一組數據的直方圖

下面的Python函數用於計算具有大小相等的數據的數據直方圖。我想獲得正確的結果Python-計算一組數據的直方圖

[1, 6, 4, 6]

但是之後我運行代碼，得到它導致

[7, 12, 17, 17]

這是不正確的。任何人都可以知道如何解決它？

# Computes the histogram of a set of data 
def histogram(data, num_bins): 

# Find what range the data spans, and use it to calculate the bin size. 
span = max(data) - min(data) 
bin_size = span/num_bins 

# Calculate the thresholds for each bin. 
thresholds = [0] * num_bins 
for i in range(num_bins): 
    thresholds[i] += bin_size * (i+1) 

# Compute the histogram 
counts = [0] * num_bins 
for datum in data: 
    # Increment the count of the bin that the datum falls in 
    for bin_index, threshold in enumerate(thresholds): 
     if datum <= threshold: 
      counts[bin_index] += 1 
return counts 

# Some random data 
data = [-3.2, 0, 1, 1.5, 1.6, 1.9, 5, 6, 9, 1, 4, 5, 8, 9, 5, 6.7, 9] 
print("Correct result:\t" + str([1, 6, 4, 6])) 
print("Your result:\t" + str(histogram(data, num_bins=4)))

來源

2015-04-30 user21

你認爲是什麼使得它不正確的？ – miradulo

你的代碼是無效的Python。請[編輯]它並修復注意事項。 –

@Tichodroma：感謝您的編輯。 – user21

只有你有兩個邏輯錯誤

（1）計算閾

（2）在添加破，一旦發現範圍

def histogram(data, num_bins): 
    span = max(data) - min(data) 
    bin_size = float(span)/num_bins 
    thresholds = [0] * num_bins 

    for i in range(num_bins): 
    #I change thresholds calc 
    thresholds[i] = min(data) + bin_size * (i+1) 

    counts = [0] * num_bins 
    for datum in data: 
    for bin_index, threshold in enumerate(thresholds): 
     if datum <= threshold: 
     counts[bin_index] += 1 
     #I add a break 
     break 
    return counts 

data = [-3.2, 0, 1, 1.5, 1.6, 1.9, 5, 6, 9, 1, 4, 5, 8, 9, 5, 6.7, 9] 
print("Correct result:\t" + str([1, 6, 4, 6])) 
print("Your result:\t" + str(histogram(data, num_bins=4)))

來源

2015-04-30 13:11:24

，如果你想找到直方圖使用numpy的

import numpy as np 
np.histogram([-3.2, 0, 1, 1.5, 1.6, 1.9, 5, 6, 9, 1, 4, 5, 8, 9, 5, 6.7, 9],4)

來源

2015-04-30 13:11:20

檢查閾值定義和if語句。這個工程：

所有的

def histogram(data, num_bins): 

    # Find what range the data spans, and use it to calculate the bin size. 
    span = max(data) - min(data) 
    bin_size = span/float(num_bins) 

    # Calculate the thresholds for each bin. 
    thresholds = [0 for i in range(num_bins+1)] 
    for i in range(num_bins): 
     thresholds[i] += bin_size * (i) 

    print thresholds 
    # Compute the histogram 
    counts = [0 for i in range(num_bins)] 
    for datum in data: 
     # Increment the count of the bin that the datum falls in 
     for bin_index, threshold in enumerate(thresholds): 
      if thresholds[bin_index-1] <= datum <= threshold: 
       counts[bin_index] += 1 
    return counts

來源

2015-04-30 13:13:57 Gioelelm

它說編譯錯誤 - http://ideone.com/tlyZ4B – user21

首先，如果只是想直方圖數據，numpy的提供這一點。但是，你問自己如何做到這一點。你的代碼表明你忘記了你想要做的事情，所以把你的功能分解成更小的功能。例如，要計算閾值，請編寫函數thresholds(xmin, xmax, nbins)，或更好地使用numpy.linspace。如果您認爲相對於0（而不是min(data)）遞增，並且如果幸運的話，可能會提醒您不要指望精確的浮點累加，這會引起您注意出現的問題。所以你可能最終與

def thresholds(xmin, xmax, nbins): 
    span = (xmax - xmin)/float(nbins) 
    thresholds = [xmin + (i+1)*span for i in range(nbins)] 
    thresholds[-1] = xmax 
    return thresholds

接下來，你需要得到bin計數。再次，您可以使用。與你的代碼相比，重要的是不增加一個以上的bin。最後，你可能會得到這樣的

def counts(data, bounds): 
    counts = [0] * len(bounds) 
    for datum in data: 
     bin = min(i for i,bound in enumerate(bounds) if bound >= datum) 
     counts[bin] += 1 
    return counts

現在你已經準備好了：

def histogram02(data, num_bins): 
    xmin = min(data) 
    xmax = max(data) 
    th = thresholds(xmin, xmax, num_bins) 
    return counts(data, th)

來源

2015-04-30 14:02:14 Alan

Python-計算一組數據的直方圖

回答

相關問題