2016-06-19 40 views
1

我正在尋找一種方法來重新索引數據與自定義函數。我的數據如下:熊貓:與自定義函數重新索引

     AAA BBB CCC DDD 
Time            
2009-01-30 09:30:00 6407.04 43.90 44.01 85.11 
2009-01-30 09:39:00 6403.20 43.82 44.01 84.93 
2009-01-30 09:40:00 6400.00 43.90 44.03 84.90 
2009-01-30 09:45:00 6396.16 43.97 44.04 84.91 
2009-01-30 09:48:00 6393.60 44.02 44.07 84.81 
2009-01-30 09:55:00 6400.00 44.31 44.14 84.78 
2009-01-30 09:56:00 6406.40 44.36 44.16 84.57 
2009-01-30 09:59:00 6426.24 44.36 44.11 84.25 
2009-01-30 10:00:00 6438.40 44.32 44.09 84.32 
2009-01-30 10:06:00 6495.36 44.43 44.16 84.23 

它是一些股票價格的一分鐘的數據。我想將交易日拆分爲5個部分並重新採樣我的數據。 我開始創建自定義索引:

index_date = pd.date_range('2009-01-30', '2016-03-01') 
    index_date = pd.Series(index_date) 
    index_time = pd.date_range('09:30:00', '16:00:00', freq='78min') 
    index_time = pd.Series(index_time.time) 

    index = index_date.apply(
     lambda d: index_time.apply(
      lambda t: datetime.combine(d, t) 
      ) 
     ).unstack().sort_values().reset_index(drop=True) 

讓我們假設我想申請基本百分比變化功能:

def percent_change(x): 
    if len(x): 
     return (x[-1]-x[0])/x[0] 

所需的數據集sholud如下所示:

     AAA BBB CCC DDD 

2009-01-30 09:30:00 NaN NaN NaN NaN 
2009-01-30 10:48:00  y  y y y # where y is the output of the  
2009-01-30 12:06:00  x  x x x  percent_change function from 
2009-01-30 13:24:00       9:30 to 14:48 
2009-01-30 14:42:00       # x is the output of the 
2009-01-30 16:00:00       percent_change function 
2009-01-31 09:30:00       from 10:49 to 12:06, etc 
2009-01-31 10:48:00 

一我的數據可以在這裏找到更大的樣本: https://www.dropbox.com/s/h29xlpveb1o7p2u/data.csv?dl=0
我該怎麼辦那?

+1

您可以發佈所需要的數據設置(包括一個新的索引)? – MaxU

+0

已被添加到問題 –

回答

3

UPDATE:

In [182]: %paste 
(df.groupby(df.index.date) 
    .apply(lambda x: x.resample('78T', 
           loffset=pd.Timedelta('24minute')).mean()) 
    .ffill() 
    .pct_change() 
) 
## -- End pasted text -- 
Out[182]: 
            vxxc 
      Time 
2009-02-02 2009-02-02 09:30:00  NaN 
      2009-02-02 10:48:00 -0.010745 
      2009-02-02 12:06:00 -0.006372 
      2009-02-02 13:24:00 -0.003701 
      2009-02-02 14:42:00 0.001614 
      2009-02-02 16:00:00 -0.005668 
2009-02-03 2009-02-03 09:30:00 -0.009334 
      2009-02-03 10:48:00 -0.007039 
      2009-02-03 12:06:00 -0.002014 
      2009-02-03 13:24:00 -0.002705 
      2009-02-03 14:42:00 -0.017530 
      2009-02-03 16:00:00 -0.004704 
      2009-02-03 17:18:00 -0.001893 
2009-02-04 2009-02-04 09:30:00 -0.019076 
      2009-02-04 10:48:00 -0.002563 
      2009-02-04 12:06:00 0.002348 
      2009-02-04 13:24:00 0.010099 
      2009-02-04 14:42:00 0.013081 
      2009-02-04 16:00:00 -0.000264 
      2009-02-04 17:18:00 0.007121 
2009-02-05 2009-02-05 09:30:00 0.026527 
      2009-02-05 10:48:00 -0.013580 
      2009-02-05 12:06:00 -0.018056 
      2009-02-05 13:24:00 -0.005020 
      2009-02-05 14:42:00 -0.006316 
      2009-02-05 16:00:00 0.003269 
2009-02-06 2009-02-06 09:30:00 -0.030773 
      2009-02-06 10:48:00 0.001088 
      2009-02-06 12:06:00 0.010469 
      2009-02-06 13:24:00 -0.008337 
...         ... 
2009-02-23 2009-02-23 09:30:00 0.002312 
      2009-02-23 10:48:00 0.012162 
      2009-02-23 12:06:00 0.009785 
      2009-02-23 13:24:00 0.008687 
      2009-02-23 14:42:00 0.000421 
      2009-02-23 16:00:00 0.012550 
2009-02-24 2009-02-24 09:30:00 -0.009290 
      2009-02-24 10:48:00 -0.017526 
      2009-02-24 12:06:00 -0.004194 
      2009-02-24 13:24:00 -0.021528 
      2009-02-24 14:42:00 -0.027898 
      2009-02-24 16:00:00 -0.012646 
2009-02-25 2009-02-25 09:30:00 0.021827 
      2009-02-25 10:48:00 0.001863 
      2009-02-25 12:06:00 -0.012693 
      2009-02-25 13:24:00 -0.006884 
      2009-02-25 14:42:00 -0.013019 
      2009-02-25 16:00:00 -0.008020 
2009-02-26 2009-02-26 09:30:00 -0.015104 
      2009-02-26 10:48:00 -0.011319 
      2009-02-26 12:06:00 0.019160 
      2009-02-26 13:24:00 0.016271 
      2009-02-26 14:42:00 0.003807 
      2009-02-26 16:00:00 0.007333 
2009-02-27 2009-02-27 09:30:00 0.023949 
      2009-02-27 10:48:00 -0.027659 
      2009-02-27 12:06:00 -0.006932 
      2009-02-27 13:24:00 -0.003167 
      2009-02-27 14:42:00 0.005263 
      2009-02-27 16:00:00 0.010594 

[118 rows x 1 columns] 

OLD答案:

你能做到這樣:

In [104]: df.resample('18T').pct_change() 
C:\envs\py35\Scripts\ipython:1: FutureWarning: .resample() is now a deferred operation 
use .resample(...).mean() instead of .resample(...) 
Out[104]: 
          AAA  BBB  CCC  DDD 
Time 
2009-01-30 09:18:00  NaN  NaN  NaN  NaN 
2009-01-30 09:36:00 -0.001373 0.000626 0.000625 -0.002614 
2009-01-30 09:54:00 0.005477 0.009755 0.002146 -0.005389 

,或者如果我們想擺脫FutureWarning的:

In [109]: df.resample('18T').mean().pct_change() 
Out[109]: 
          AAA  BBB  CCC  DDD 
Time 
2009-01-30 09:18:00  NaN  NaN  NaN  NaN 
2009-01-30 09:36:00 -0.001373 0.000626 0.000625 -0.002614 
2009-01-30 09:54:00 0.005477 0.009755 0.002146 -0.005389 

注:我用18分鐘區間,而不是78T,因爲你的樣本數據有低於78分鐘的數據,所以更改18T78T爲你的真實數據集

+0

Resample在我的情況下不起作用,因爲你不能等分78分鐘(添加自定義基本參數也無濟於事),所以它會改變索引(這就是爲什麼我創建了自定義索引)。我希望每天從9:30開始到16:00結束。 –

+0

@VitaliHalapjan,好的,我明白了,你可以發佈/上傳包含數據至少兩天的更大的DF嗎? – MaxU

+0

添加了一個Dropbox鏈接到最初的問題 –