2017-04-04 95 views
1

假設我有一個時間序列爲這樣:事件研究大熊貓

pd.Series(np.random.rand(20), index=pd.date_range("1990-01-01",periods=20)) 

其中給出,

1990-01-01 0.018363 
1990-01-02 0.288625 
1990-01-03 0.460708 
1990-01-04 0.663063 
1990-01-05 0.434250 
1990-01-06 0.504893 
1990-01-07 0.587743 
1990-01-08 0.412223 
1990-01-09 0.604656 
1990-01-10 0.960338 
1990-01-11 0.606765 
1990-01-12 0.110480 
1990-01-13 0.671683 
1990-01-14 0.178488 
1990-01-15 0.458074 
1990-01-16 0.219303 
1990-01-17 0.172665 
1990-01-18 0.429534 
1990-01-19 0.505891 
1990-01-20 0.242567 
Freq: D, dtype: float64 

假設事件的日期是1990年1月5日和1990年1月15日。我想子集數據下降到長度的窗口(-2,+ 2)周圍像這樣的事件:

1990-01-03 0.460708 
1990-01-04 0.663063 
1990-01-05 0.434250 
1990-01-06 0.504893 
1990-01-07 0.587743 
1990-01-13 0.671683 
1990-01-14 0.178488 
1990-01-15 0.458074 
1990-01-16 0.219303 
1990-01-17 0.172665 
Freq: D, dtype: float64 

我應該如何去這樣做呢?

回答

1

我認爲你可以使用concat所有Series創建由list comprehensionloc

date1 = pd.to_datetime('1990-01-05') 
date2 = pd.to_datetime('1990-01-15') 
window = 2 

dates = [date1, date2] 

s1 = pd.concat([s.loc[date - pd.Timedelta(window, unit='d'): 
         date + pd.Timedelta(window, unit='d')] for date in dates]) 
print (s1) 
1990-01-03 0.284356 
1990-01-04 0.997019 
1990-01-05 0.293225 
1990-01-06 0.451379 
1990-01-07 0.743209 
1990-01-13 0.254926 
1990-01-14 0.339728 
1990-01-15 0.793124 
1990-01-16 0.121002 
1990-01-17 0.930924 
dtype: float64 
+0

感謝您的幫助,但由於這兩個日期是兩個事件的日期。使用你的方法可以一次處理一個,你是否建議我爲兩個事件日期情況做一個for循環? – zsljulius

+0

我認爲是的,'iloc'在開始'1990-01-01'和結束日期'1990-01-17'可能會有問題。 – jezrael

1

試試這個:

In [23]: df['A'] 
Out[23]: 
2013-01-01 0.469112 
2013-01-02 1.212112 
2013-01-03 -0.861849 
2013-01-04 0.721555 
2013-01-05 -0.424972 
2013-01-06 -0.673690 
Freq: D, Name: A, dtype: float64 

In [25]: df['20130102':'20130104'] 
Out[25]: 
        A   B   C   D 
2013-01-02 1.212112 -0.173215 0.119209 -1.044236 
2013-01-03 -0.861849 -2.104569 -0.494929 1.071804 
2013-01-04 0.721555 -0.706771 -1.039575 0.271860 

[3 rows x 4 columns] 

從食譜:http://pandas.pydata.org/pandas-docs/version/0.13.1/10min.html?highlight=select%20where( 「選擇」 項)

1

我會建立一個布爾掩模來選擇有趣的值:

import numpy as np 
import pandas as pd 

s = pd.Series(np.random.rand(20), index=pd.date_range("1990-01-01",periods=20)) 
events = [pd.to_datetime('1990-01-05'), pd.to_datetime('1990-01-15')] 
max_delta = pd.Timedelta(2, unit='d') 

mask = np.zeros_like(s, dtype=bool) 
for event in events: 
    mask |= np.abs(s.index - event) <= max_delta 
s_events = s[mask] 

print(s_events) 

輸出:

1990-01-03 0.877271 
1990-01-04 0.770214 
1990-01-05 0.427380 
1990-01-06 0.971676 
1990-01-07 0.533582 
1990-01-13 0.060556 
1990-01-14 0.932072 
1990-01-15 0.501966 
1990-01-16 0.081177 
1990-01-17 0.167775 
dtype: float64