在python中使用scipy/numpy合併數據

有沒有一種更有效的方法來獲取預先指定的數組中的平均數組？例如，我有一個數組數組和一個數組，對應於該數組中的bin開始和結束位置，並且我想僅在這些數組中使用平均值？我有下面的代碼，但我想知道如何削減和改進。謝謝。在python中使用scipy/numpy合併數據

from scipy import * 
from numpy import * 

def get_bin_mean(a, b_start, b_end): 
    ind_upper = nonzero(a >= b_start)[0] 
    a_upper = a[ind_upper] 
    a_range = a_upper[nonzero(a_upper < b_end)[0]] 
    mean_val = mean(a_range) 
    return mean_val 


data = rand(100) 
bins = linspace(0, 1, 10) 
binned_data = [] 

n = 0 
for n in range(0, len(bins)-1): 
    b_start = bins[n] 
    b_end = bins[n+1] 
    binned_data.append(get_bin_mean(data, b_start, b_end)) 

print binned_data

來源

2011-05-28 user248237dfsf

117

它可能更快，更容易使用numpy.digitize()：

import numpy 
data = numpy.random.random(100) 
bins = numpy.linspace(0, 1, 10) 
digitized = numpy.digitize(data, bins) 
bin_means = [data[digitized == i].mean() for i in range(1, len(bins))]

對此的一個替代方法是使用numpy.histogram()：

bin_means = (numpy.histogram(data, bins, weights=data)[0]/
      numpy.histogram(data, bins)[0])

嘗試自己哪一個是速度更快...：）

來源

2011-05-28 17:53:58

我沒有看到差異 - 哪個更快？ – user248237dfsf 2011-05-28 22:24:11

@user：我不知道哪一個數據和參數更快。這兩種方法都應該比你的方法更快，我希望'histogram（）'方法對於大量的bin來說更快。但是你必須自我介紹，我不能爲你做這件事。 – 2011-05-28 22:32:33

不知道爲什麼這個線程壞了;但這裏是一個2014批准的答案，這應該是遠快：

import numpy as np 

data = np.random.rand(100) 
bins = 10 
slices = np.linspace(0, 100, bins+1, True).astype(np.int) 
counts = np.diff(slices) 

mean = np.add.reduceat(data, slices[:-1])/counts 
print mean

來源

2014-02-11 20:17:50

您正在回答不同的問題。例如你的'mean [0] = np.mean（data [0:10]）'，而正確的答案應該是'np.mean（data [data <10]）' – 2015-09-03 12:06:05

的SciPy的（> = 0.11）函數scipy.stats.binned_statistic具體解決了上述問題。

對於同樣的例子在前面的回答，SciPy的解決辦法是

import numpy as np 
from scipy.stats import binned_statistic 

data = np.random.rand(100) 
bin_means = binned_statistic(data, data, bins=10, range=(0, 1))[0]

來源

2014-11-12 10:19:26 divenex

的numpy_indexed包（免責聲明：我是它的作者）包含的功能有效地執行這種類型的操作：

import numpy_indexed as npi 
print(npi.group_by(np.digitize(data, bins)).mean(data))

這與我之前發佈的解決方案基本相同;但現在包裹在一個漂亮的界面，以測試和所有:)

來源

2016-04-02 15:40:17

我想補充，並回答問題find mean bin values using histogram2d python的SciPy的也有專門設計的compute a bidimensional binned statistic for one or more sets of data

import numpy as np 
from scipy.stats import binned_statistic_2d 

x = np.random.rand(100) 
y = np.random.rand(100) 
values = np.random.rand(100) 
bin_means = binned_statistic_2d(x, y, values, bins=10).statistic

功能的功能scipy.stats.binned_statistic_dd是高維數據集的這個函數的推廣

來源

2016-07-26 10:50:33 Chmeul

在python中使用scipy/numpy合併數據

回答

相關問題