平均值取決於binning與第二個變量

我正在使用python/numpy。作爲輸入數據，我有大量的值對(x,y)。我基本上想繪製<y>(x)，即某個數據倉x的平均值爲y。目前我使用普通的for循環來實現這一點，這非常緩慢。平均值取決於binning與第二個變量

# create example data 
x = numpy.random.rand(1000) 
y = numpy.random.rand(1000) 
# set resolution 
xbins = 100 
# find x bins 
H, xedges, yedges = numpy.histogram2d(x, y, bins=(xbins,xbins)) 
# calculate mean and std of y for each x bin 
mean = numpy.zeros(xbins) 
std = numpy.zeros(xbins) 
for i in numpy.arange(xbins): 
    mean[i] = numpy.mean(y[ numpy.logical_and(x>=xedges[i], x<xedges[i+1]) ]) 
    std[i] = numpy.std (y[ numpy.logical_and(x>=xedges[i], x<xedges[i+1]) ])

是否有可能有一種矢量化的文字呢？

來源

2013-03-18 Jakob S.

你是不必要的複雜的事情。所有你需要知道的是，在每x斌，什麼都n，sy和sy2，y值在x倉的數量，這些y值的總和，他們的平方和。你可以得到那些爲：

>>> n, _ = np.histogram(x, bins=xbins) 
>>> sy, _ = np.histogram(x, bins=xbins, weights=y) 
>>> sy2, _ = np.histogram(x, bins=xbins, weights=y*y)

從這些：

>>> mean = sy/n 
>>> std = np.sqrt(sy2/n - mean*mean)

來源

2013-03-18 13:33:37 Jaime

哇 - 我沒想到解釋'y'爲「權重」的到'x' ...好極了！ – 2013-03-18 13:48:15

@JakobS。沒有人會......直到看到它第一次完成！ – Jaime 2013-03-18 13:53:37

這確實很酷。 – HyperCube 2013-03-18 13:57:29

如果你可以用熊貓：

import pandas as pd 
xedges = np.linspace(x.min(), x.max(), xbins+1) 
xedges[0] -= 0.00001 
xedges[-1] += 0.000001 
c = pd.cut(x, xedges) 
g = pd.groupby(pd.Series(y), c.labels) 
mean2 = g.mean() 
std2 = g.std(0)

來源

2013-03-18 13:50:36 HYRY

平均值取決於binning與第二個變量

回答

相關問題