2013-03-20 24 views
0

我有一個數據集是元組的蟒蛇像這樣的列表:數據處理與誤差條matplotlib直方圖

dataSet = [(6.1248199999999997, 27), (6.4400500000000003, 4), (5.9150600000000004, 1), (5.5388400000000004, 38), (5.82559, 1), (7.6892199999999997, 2), (6.9047799999999997, 1), (6.3516300000000001, 76), (6.5168699999999999, 1), (7.4382099999999998, 1), (5.4493299999999998, 1), (5.6254099999999996, 1), (6.3227700000000002, 1), (5.3321899999999998, 11), (6.7402300000000004, 4), (7.6701499999999996, 1), (5.4589400000000001, 3), (6.3089700000000004, 1), (6.5926099999999996, 2), (6.0003000000000002, 5), (5.9845800000000002, 1), (6.4967499999999996, 2), (6.51227, 6), (7.0302600000000002, 1), (5.7271200000000002, 49), (7.5311300000000001, 7), (5.9495800000000001, 2), (5.1487299999999996, 18), (5.7637099999999997, 6), (5.5144500000000001, 44), (6.7988499999999998, 1), (5.2578399999999998, 1)] 

凡元組的第一個元素是一個能源和一個第二計數器,有多少傳感器受到影響。

我想創建一個直方圖來研究受影響的傳感器數量和能量之間的關係。我是很新,matplotlib(和Python),但是這是我迄今所做的:

import math 
import matplotlib.pyplot as plt 

dataSet = [(6.1248199999999997, 27), (6.4400500000000003, 4), (5.9150600000000004, 1), (5.5388400000000004, 38), (5.82559, 1), (7.6892199999999997, 2), (6.9047799999999997, 1), (6.3516300000000001, 76), (6.5168699999999999, 1), (7.4382099999999998, 1), (5.4493299999999998, 1), (5.6254099999999996, 1), (6.3227700000000002, 1), (5.3321899999999998, 11), (6.7402300000000004, 4), (7.6701499999999996, 1), (5.4589400000000001, 3), (6.3089700000000004, 1), (6.5926099999999996, 2), (6.0003000000000002, 5), (5.9845800000000002, 1), (6.4967499999999996, 2), (6.51227, 6), (7.0302600000000002, 1), (5.7271200000000002, 49), (7.5311300000000001, 7), (5.9495800000000001, 2), (5.1487299999999996, 18), (5.7637099999999997, 6), (5.5144500000000001, 44), (6.7988499999999998, 1), (5.2578399999999998, 1)] 

binWidth = .2 
binnedDataSet = [] 
#create another list and append the "binning-value" 
for item in dataSet: 
    binnedDataSet.append((item[0], item[1], math.floor(item[0]/binWidth)*binWidth)) 

energies, sensorHits, binnedEnergy = [[q[i] for q in binnedDataSet] for i in (0,1,2)] 
plt.plot(binnedEnergy, sensorHits, 'ro') 
plt.show() 

到目前爲止是這種情況(儘管它並不甚至看起來像一個柱狀圖;-)但確定),但現在我想計算每個垃圾箱的平均值並附加一些錯誤欄。

這樣做的方法是什麼?我查看了matplotlib的直方圖示例,但它們都使用一維數據進行計數,因此您可以得到一個頻譜...這並不是我想要的。

回答

1

我有點受正是你正在嘗試做的困惑,但我認爲這(一階)會做什麼,我想你想:

bin_width = .2 
bottom = 5.0 
top = 8.0 

binned_data = [0.0] * int(math.ceil(((top - bottom)/bin_width))) 
binned_count = [0] * int(math.ceil(((top - bottom)/bin_width))) 
n_bins = len(binned_data) 
for E, cnt in dataSet: 
    if E < bottom or E > top: 
     print 'out of range' 
     continue 
    bin_id = int(math.floor(n_bins * (E - bottom)/(top - bottom))) 
    binned_data[bin_id] += cnt 
    binned_count[bin_id] += 1 

binned_avergaed_data = [C_sum/hits if hits > 0 else 0 for C_sum, hits in zip(binned_data, binned_count)] 

bin_edges = [bottom + j * bin_width for j in range(len(binned_data))] 

plt.bar(bin_edges, binned_avergaed_data, width=bin_width) 

我也建議尋找到numpy,它會使寫起來更簡單。

+0

謝謝,這是正確的方向! – tamasgal 2013-03-23 12:25:35