2017-08-01 38 views
2

我期待做到以下幾點:通過滾動對象將多個滾動函數應用於熊貓羣組的多個列?

  1. 集團數據幀

  2. 對於每個組,生成時間窗口(給定的時間單位)

  3. 在所產生的結構,採取一切列並應用多個滾動彙總統計函數,以便結果具有每個組/時間窗組合的彙總統計信息。

下面是一個例子的數據集:

gps_time,name,val_x,val_y 
2017-07-04 11:20:23.423,bob,0.963,0.201 
2017-07-04 11:20:24.492,bob,0.964,0.203 
2017-07-04 11:20:24.499,bob,0.962,0.210 
2017-07-04 11:20:25.627,sarah,0.893,0.010 
2017-07-04 11:20:28.627,sarah,0.894,0.012 
2017-07-04 11:20:29.613,sarah,0.895,0.014 
2017-07-04 11:20:29.630,larry,-0.423,0.231 
2017-07-04 11:20:30.423,larry,-0.431,0.22 
2017-07-04 11:20:30.428,larry,-0.432,0.222 

而對於上述數據的期望的輸出,通過名稱和與1秒的窗口分組:

name,gps_time,val_x_mean,val_x_med,val_y_mean,val_y_med 
bob,2017-07-04 11:20:23.423,0.963,0.963,0.201,0.201 
bob,2017-07-04 11:20:24.492,0.963,0.963,0.2065,0.2065 
sarah,2017-07-04 11:20:25.627,0.893,0.89,0.010,0.010 
sarah,2017-07-04 11:20:28.627,0.8945,0.8945,0.013,0.013 
larry,2017-07-04 11:20:30.423,-0.4287,-0.431,0.336,0.222 

我已經嘗試使用列表理解來生成一堆數據幀,但這個過程非常慢,我必須爲每一列調用它。

回答

5

讓我們用groupbypd.Grouper

df_out = df.groupby([pd.Grouper(freq='S', key='gps_time'),'name']).agg(['mean','median']) 
df_out.columns = df_out.columns.map('_'.join) 
df_out.reset_index() 

輸出:

   gps_time name val_x_mean val_x_median val_y_mean \ 
0 2017-07-04 11:20:23 bob  0.9630  0.9630  0.2010 
1 2017-07-04 11:20:24 bob  0.9630  0.9630  0.2065 
2 2017-07-04 11:20:25 sarah  0.8930  0.8930  0.0100 
3 2017-07-04 11:20:28 sarah  0.8940  0.8940  0.0120 
4 2017-07-04 11:20:29 larry  -0.4230  -0.4230  0.2310 
5 2017-07-04 11:20:29 sarah  0.8950  0.8950  0.0140 
6 2017-07-04 11:20:30 larry  -0.4315  -0.4315  0.2210 

    val_y_median 
0  0.2010 
1  0.2065 
2  0.0100 
3  0.0120 
4  0.2310 
5  0.0140 
6  0.2210 
+0

這是完美的!我如何指定跨區間的特定百分比重疊? –

+0

解釋重疊百分比,我將如何計算它? –

+0

50%重疊意味着給定兩個區間,第k個區間的最後50%是第(k + 1)個區間的前50%。例如,如果我們有[1,2,3,4,5,6,7,8]列表,與4個觀測窗口重疊50%的區間會導致[1,2,3,4] ,[3,4,5,6],[5,6,7,8]。 –