2017-08-01 259 views
1

我使用熊貓1分鐘的時間序列數據的OHLC重新採樣時,15分鐘將很好地工作,例如在以下的數據幀:熊貓重新取樣上OHLC數據從1分鐘至1H

ohlc_dict = {'Open':'first', 'High':'max', 'Low':'min', 'Close': 'last'} 
df.resample('15Min').apply(ohlc_dict).dropna(how='any').loc['2011-02-01'] 

Date Time    Open   High  Low  Close 
------------------------------------------------------------------   
2011-02-01 09:30:00 3081.940 3086.860 3077.832 3081.214 

2011-02-01 09:45:00 3082.422 3083.730 3071.922 3073.801 

2011-02-01 10:00:00 3073.303 3078.345 3069.130 3078.345 

2011-02-01 10:15:00 3078.563 3078.563 3071.522 3072.279 

2011-02-01 10:30:00 3071.873 3071.873 3063.497 3067.364 

2011-02-01 10:45:00 3066.735 3070.523 3063.402 3069.974 

2011-02-01 11:00:00 3069.561 3069.981 3066.286 3069.981 

2011-02-01 11:15:00 3070.602 3074.088 3070.373 3073.919 

2011-02-01 13:00:00 3074.778 3074.823 3069.925 3069.925 

2011-02-01 13:15:00 3070.096 3070.903 3063.457 3063.457 

2011-02-01 13:30:00 3063.929 3067.358 3063.929 3067.358 

2011-02-01 13:45:00 3067.570 3072.455 3067.570 3072.247 

2011-02-01 14:00:00 3072.927 3081.357 3072.767 3080.175 

2011-02-01 14:15:00 3078.843 3079.435 3076.733 3076.782 

2011-02-01 14:30:00 3076.721 3081.980 3076.721 3081.912 

2011-02-01 14:45:00 3082.822 3083.381 3076.722 3077.283 

然而,當我重新採樣1分鐘到1H,問題出來了。我使用默認設置,並從上午9點開始查找時間,但營業時間爲上午9點30分。

df.resample('1H').apply(ohlc_dict).dropna(how='any').loc['2011-02-01'] 

1HourOHLC Wrong in Morning

然後我試圖改變base設置,但在下午的會議失敗。市場應該在下午13點開放,下午15點結束,所以應該是下午13點,下午14點,下午15點,總共3個酒吧。

df.resample('60MIN',base=30).apply(ohlc_dict).dropna(how='any').loc['2011-02-01'] 

1HourOHLC Wrong in afternoon

總之,問題是,我希望它在市場上配件,有6條(9:30,10:30,11:30,1:00,2:00,3:00),但在pandasresample只給我5個酒吧(9:30,10:30,11:30,1:30,2:30)

我尋找了很久網上時間。但沒用。請幫助或嘗試提供一些想法如何實現這一點。 謝謝。

+0

我不認爲你可以做到這一點,因爲'resample'是在固定的時間間隔。您可以嘗試重新採樣到2個獨立的dfs中,然後切分dfs並將它們連接在一起。 – Yeile

回答

0

以下是數據框中僅有的Close的答案的一部分。如Yelie所說, 和resamplepandas可能不會滿足我的初衷。 因此,我嘗試按iterrows提取所需的項目。

from datetime import datetime 
from datetime import timedelta 

def extract(df): 
    data = pd.DataFrame() 
    for index, row in df.iterrows(): 
     if index.to_pydatetime().minute == 30 and index.to_pydatetime().hour < 12 : 
      data = data.append(row) 
     elif index.to_pydatetime().minute == 0 and index.to_pydatetime().hour > 12 : 
      data = data.append(row) 
     elif index.to_pydatetime().minute == 29 and index.to_pydatetime().hour == 11 : 
      row = row = row.rename(index.to_pydatetime() + timedelta(minutes = 1)) 
      data = data.append(row) 
     elif index.to_pydatetime().minute == 59 and index.to_pydatetime().hour == 14 : 
      row = row = row.rename(index.to_pydatetime() + timedelta(minutes = 1)) 
      data = data.append(row) 
    return data 

data = extract(df.loc['2011-02-01']) 
data 

但是,其他項目是不正確的,除了close。下面 結果表明:

Close        High  Low   Open  Volume  turnover 
2011-02-01 09:30:00 3081.940 3081.940 3081.940 3081.940 74767100.0 996328900.0 
2011-02-01 10:30:00 3071.873 3071.873 3071.873 3071.873 18754100.0 250694100.0 
2011-02-01 11:30:00 3073.919 3073.919 3073.919 3073.919 13762700.0 179169200.0 
2011-02-01 13:00:00 3074.778 3074.778 3074.778 3074.778 25992700.0 321678500.0 
2011-02-01 14:00:00 3072.927 3072.927 3072.927 3072.927 11682300.0 161534600.0 
2011-02-01 15:00:00 3077.283 3077.283 3077.283 3077.283 68184500.0 930561900.0