2013-08-29 321 views
14

我以基本上隨機的間隔採樣數據。我想用numpy(或其他python包)計算加權移動平均數。我有一個移動平均線的粗略實現,但我無法找到一個加權移動平均線的好方法,因此朝向中心的值的權重大於邊緣的值。python中的加權移動平均數

這裏我生成一些樣本數據,然後取一個移動平均值。我怎樣才能最輕鬆地實現加權移動平均線?謝謝!

import numpy as np 
import matplotlib.pyplot as plt 

#first generate some datapoint for a randomly sampled noisy sinewave 
x = np.random.random(1000)*10 
noise = np.random.normal(scale=0.3,size=len(x)) 
y = np.sin(x) + noise 

#plot the data 
plt.plot(x,y,'ro',alpha=0.3,ms=4,label='data') 
plt.xlabel('Time') 
plt.ylabel('Intensity') 

#define a moving average function 
def moving_average(x,y,step_size=.1,bin_size=1): 
    bin_centers = np.arange(np.min(x),np.max(x)-0.5*step_size,step_size)+0.5*step_size 
    bin_avg = np.zeros(len(bin_centers)) 

    for index in range(0,len(bin_centers)): 
     bin_center = bin_centers[index] 
     items_in_bin = y[(x>(bin_center-bin_size*0.5)) & (x<(bin_center+bin_size*0.5))] 
     bin_avg[index] = np.mean(items_in_bin) 

    return bin_centers,bin_avg 

#plot the moving average 
bins, average = moving_average(x,y) 
plt.plot(bins, average,label='moving average') 

plt.show() 

輸出: Data and moving average

使用從crs17建議在np.average功能用「權重=」,我想出了加權平均函數,它使用一個高斯函數對數據進行加權:

def weighted_moving_average(x,y,step_size=0.05,width=1): 
    bin_centers = np.arange(np.min(x),np.max(x)-0.5*step_size,step_size)+0.5*step_size 
    bin_avg = np.zeros(len(bin_centers)) 

    #We're going to weight with a Gaussian function 
    def gaussian(x,amp=1,mean=0,sigma=1): 
     return amp*np.exp(-(x-mean)**2/(2*sigma**2)) 

    for index in range(0,len(bin_centers)): 
     bin_center = bin_centers[index] 
     weights = gaussian(x,mean=bin_center,sigma=width) 
     bin_avg[index] = np.average(y,weights=weights) 

    return (bin_centers,bin_avg) 

結果看起來不錯: Working weighted average using numpy

+0

嘗試搜索與數字低通濾波器的權重有關的信息。 –

+2

您已經在熊貓中實現了[指數加權矩函數](http://pandas.pydata.org/pandas-docs/dev/computation.html#exponentially-weighted-moment-functions)。 –

回答

6

你可以使用numpy.average它允許您指定的權重:

>>> bin_avg[index] = np.average(items_in_bin, weights=my_weights) 

所以要計算你會發現在bin每個數據點的X座標的權重,計算他們的距離區間中央。

+0

是的!我不知道這個平均功能,以及它如何加權!我在我的問題底部發布了我的完整解決方案。 – DanHickstein

4

這不會給出一個確切的解決方案,但它會讓您的生活更輕鬆,並且可能會足夠好...首先,將您的樣品放入小容器中。一旦你重新採樣的數據被平均分佈,可以使用步幅技巧和np.average做一個加權平均值:

from numpy.lib.stride_tricks import as_strided 

def moving_weighted_average(x, y, step_size=.1, steps_per_bin=10, 
          weights=None): 
    # This ensures that all samples are within a bin 
    number_of_bins = int(np.ceil(np.ptp(x)/step_size)) 
    bins = np.linspace(np.min(x), np.min(x) + step_size*number_of_bins, 
         num=number_of_bins+1) 
    bins -= (bins[-1] - np.max(x))/2 
    bin_centers = bins[:-steps_per_bin] + step_size*steps_per_bin/2 

    counts, _ = np.histogram(x, bins=bins) 
    vals, _ = np.histogram(x, bins=bins, weights=y) 
    bin_avgs = vals/counts 
    n = len(bin_avgs) 
    windowed_bin_avgs = as_strided(bin_avgs, 
            (n-steps_per_bin+1, steps_per_bin), 
            bin_avgs.strides*2) 

    weighted_average = np.average(windowed_bin_avgs, axis=1, weights=weights) 

    return bin_centers, weighted_average 

你現在可以做這樣的事情:

#plot the moving average with triangular weights 
weights = np.concatenate((np.arange(0, 5), np.arange(0, 5)[::-1])) 
bins, average = moving_weighted_average(x, y, steps_per_bin=len(weights), 
             weights=weights) 
plt.plot(bins, average,label='moving average') 

plt.show() 

enter image description here

+0

感謝您的解決方案!這看起來也會起作用,但我發現「權重」方法更直觀一些。 – DanHickstein

+0

@DanHickstein即使是中等規模的數據集,看起來您所編碼的內容也會非常緩慢,但您是唯一可以決定它是否足夠快的人。 – Jaime

+0

啊,好點!我沒有檢查速度 - 只有它的演示工作。 – DanHickstein