2013-02-12 38 views
4

我正在處理表示向量(magnitud和direction)的時間序列數據。我想resample我的數據並使用describe函數作爲how參數。如何實現我自己的describe()函數以在resample()中使用

然而,describe方法使用標準平均值,我想使用特殊函數來平均方向。正因爲如此,我實現了基於的pandas.Series.describe()實施我自己describe方法:

def directionAverage(x): 
    result = np.arctan2(np.mean(np.sin(x)), np.mean(np.cos(x))) 
    if result < 0: 
     result += 2*np.pi 
    return result 

def directionDescribe(x): 
    data = [directionAverage(x), x.std(), x.min(), x.quantile(0.25), x.median(), x.quantile(0.75), x.max()] 
    names = ['mean', 'std', 'min', '25%', '50%', '75%', 'max'] 
    return Series(data, index=names) 

的問題是,當我這樣做:

df['direction'].resample('10Min', how=directionDescribe) 

我得到這個異常(最後幾行顯示):

File "C:\Python26\lib\site-packages\pandas\core\generic.py", line 234, in resample 
    return sampler.resample(self) 
    File "C:\Python26\lib\site-packages\pandas\tseries\resample.py", line 83, in resample 
    rs = self._resample_timestamps(obj) 
    File "C:\Python26\lib\site-packages\pandas\tseries\resample.py", line 217, in _resample_timestamps 
    result = grouped.aggregate(self._agg_method) 
    File "C:\Python26\lib\site-packages\pandas\core\groupby.py", line 1626, in aggregate 
    result = self._aggregate_generic(arg, *args, **kwargs) 
    File "C:\Python26\lib\site-packages\pandas\core\groupby.py", line 1681, in _aggregate_generic 
    return self._aggregate_item_by_item(func, *args, **kwargs) 
    File "C:\Python26\lib\site-packages\pandas\core\groupby.py", line 1706, in _aggregate_item_by_item 
    result[item] = colg.aggregate(func, *args, **kwargs) 
    File "C:\Python26\lib\site-packages\pandas\core\groupby.py", line 1357, in aggregate 
    result = self._aggregate_named(func_or_funcs, *args, **kwargs) 
    File "C:\Python26\lib\site-packages\pandas\core\groupby.py", line 1441, in _aggregate_named 
    raise Exception('Must produce aggregated value') 

的問題是:我該如何實現自己的describe功能,使得它的工作原理resample

+0

也許DF [ '方向']。重新取樣( '10分鐘')。應用(directionDescribe) – zach 2013-02-12 17:45:55

+0

感謝您的建議。它返回一個結果,但不是每個時間倉。 – Pablo 2013-02-14 15:42:54

回答

2

Okat,我想我明白了。而不是重新取樣,你可以在groupby這個組裏是一個時間單位。對於這個組,您可以應用您選擇的功能,例如您的directionAverage功能。

請注意,我正在導入TimeGrouper函數以允許按時間間隔進行分組。

import pandas as pd 
import numpy as np 
from pandas.tseries.resample import TimeGrouper 

#group your data 
new_data = df['direction'].groupby(TimeGrouper('10min')) 
#apply your function to the grouped data 
new_data.apply(directionDescribe) 
相關問題