2015-12-23 91 views
1

之間的範圍內,我有兩個大熊貓據幀(DF1和DF2):遍歷日期兩隻大熊貓dataframes的類別數

DF1有12列,其中A1,A2,...,A9是空列。以下是df1的示例:

Stock Start_Date   End_Date  a1 a2 a3 a4 .... a9 
A 09-12-2015 20:04 10-12-2015 23:04     
B 09-12-2015 10:04 09-12-2015 20:14     
A 11-12-2015 00:22 11-12-2015 08:04     
C 08-12-2015 06:56 10-12-2015 20:54     

df2有4列。下面是一個示例:

Stock date_time  Opening closing 
A 09-12-2015 21:24 144.3 10 
A 09-12-2015 21:27 225.51 24 
B 09-12-2015 10:20 134.42 11 
A 09-12-2015 20:04 231.22 17 
B 09-12-2015 10:24 399.55 32 
A 09-12-2015 20:04 246.77 21 
B 09-12-2015 14:22 76.23 8 
C 08-12-2015 09:44 232.22 15 
C 09-12-2015 20:04 222.91 12 
A 11-12-2015 02:06 93.21 7 
B 09-12-2015 20:04 211.36 26 
C 09-12-2015 20:04 111.21 8 

現在,我想的輸出是這樣,DF1:

Stock Start_Date  End_Date   a1 a2 a3 a4 ....a9 
A 09-12-2015 20:04 10-12-2015 23:04 0 2 2 0  0 
B 09-12-2015 10:04 09-12-2015 20:14 1 1 2 0  0 
A 11-12-2015 00:22 11-12-2015 08:04 1 0 0 0  0 
C 08-12-2015 06:56 10-12-2015 20:54 0 0 0 1  0 

即對DF1的每一個股票,START_DATE & END_DATE組合,結果應該具有計數在該日期時間範圍內的每個類別的df2。在此最終輸出中,a1 =計數[開放(0-100)&結束(0-10)],a2 =計數[開放(101-200)&結束(11-20)],a3 =計數[開幕(201-400)&閉幕(21-50)],a4 =開幕(0-100)&閉幕(11-20)]等等,全部9個組合。

我對此有R代碼,但對於更大的數據集效果不佳。任何人都可以幫助我如何在python/pandas中做到這一點。任何幫助表示讚賞!

回答

1

你可以試試這個解決方案,在那裏我刪除的df1空列,但他們太工作:

#merge dataframes by Stock, select datetimes between start and end 
df = df1.merge(df2,on='Stock', how='left') 
df = df[(df.date_time >= df.Start_Date) & (df.date_time <= df.End_Date)] 
#remove column date_time 
df = df.drop(['date_time'], axis=1) 
print df 
# Stock   Start_Date   End_Date Opening closing 
#0  A 2015-09-12 20:04:00 2015-10-12 23:04:00 144.30  10 
#1  A 2015-09-12 20:04:00 2015-10-12 23:04:00 225.51  24 
#2  A 2015-09-12 20:04:00 2015-10-12 23:04:00 231.22  17 
#3  A 2015-09-12 20:04:00 2015-10-12 23:04:00 246.77  21 
#5  B 2015-09-12 10:04:00 2015-09-12 20:14:00 134.42  11 
#6  B 2015-09-12 10:04:00 2015-09-12 20:14:00 399.55  32 
#7  B 2015-09-12 10:04:00 2015-09-12 20:14:00 76.23  8 
#8  B 2015-09-12 10:04:00 2015-09-12 20:14:00 211.36  26 
#13  A 2015-11-12 00:22:00 2015-11-12 08:04:00 93.21  7 
#14  C 2015-08-12 06:56:00 2015-10-12 20:54:00 232.22  15 
#15  C 2015-08-12 06:56:00 2015-10-12 20:54:00 222.91  12 
#16  C 2015-08-12 06:56:00 2015-10-12 20:54:00 111.21  8 

#values to new columns by conditions - cast boolean to integers 
df['a1'] = ((df.Opening.between(0,100)) & (df.closing.between(0,10))).astype(int) 
df['a2'] = ((df.Opening.between(100,200)) & (df.closing.between(11,20))).astype(int) 
#add other columns like a1 and a2 
print df 
# Stock   Start_Date   End_Date Opening closing a1 a2 
#0  A 2015-09-12 20:04:00 2015-10-12 23:04:00 144.30  10 0 0 
#1  A 2015-09-12 20:04:00 2015-10-12 23:04:00 225.51  24 0 0 
#2  A 2015-09-12 20:04:00 2015-10-12 23:04:00 231.22  17 0 0 
#3  A 2015-09-12 20:04:00 2015-10-12 23:04:00 246.77  21 0 0 
#5  B 2015-09-12 10:04:00 2015-09-12 20:14:00 134.42  11 0 1 
#6  B 2015-09-12 10:04:00 2015-09-12 20:14:00 399.55  32 0 0 
#7  B 2015-09-12 10:04:00 2015-09-12 20:14:00 76.23  8 1 0 
#8  B 2015-09-12 10:04:00 2015-09-12 20:14:00 211.36  26 0 0 
#13  A 2015-11-12 00:22:00 2015-11-12 08:04:00 93.21  7 1 0 
#14  C 2015-08-12 06:56:00 2015-10-12 20:54:00 232.22  15 0 0 
#15  C 2015-08-12 06:56:00 2015-10-12 20:54:00 222.91  12 0 0 
#16  C 2015-08-12 06:56:00 2015-10-12 20:54:00 111.21  8 0 0 

#groupby and sum rows 
df= df.groupby(['Stock', 'Start_Date', 'End_Date']).sum() 
df = df.drop(['Opening', 'closing'], axis=1) 
print df.reset_index() 
# Stock   Start_Date   End_Date a1 a2 
#0  A 2015-09-12 20:04:00 2015-10-12 23:04:00 0 0 
#1  A 2015-11-12 00:22:00 2015-11-12 08:04:00 1 0 
#2  B 2015-09-12 10:04:00 2015-09-12 20:14:00 1 1 
#3  C 2015-08-12 06:56:00 2015-10-12 20:54:00 0 0 
+0

它是如何工作的? – jezrael

+0

謝謝,作品完美無瑕。還有一件事,如果我在df1中有另一列(double或float)。通過更改合併中的「如何」,可以在最終輸出中獲得該結果嗎? – warwick12

+1

我認爲功能'merge'中的'on'用於匹配 - 更好的示例與圖片是[here](http://pandas.pydata.org/pandas-docs/stable/merging.html#brief-primer-on-合併的方法關係代數)。 'df = df1.merge(df2,on ='Stock',how ='left')'與'df = pd.merge(df1,df2,on ='Stock',how ='left')'相同。 – jezrael