條件合併熊貓

我是熊貓新手，試圖轉換我的一些SAS代碼。我有兩個數據集，第一個（header_mf）包含由crsp_fundno和caldt（基金ID和日期）索引的共同基金信息。在第二個數據（ret_mf）中，我有相同指數的資金回報（mret列）。我試圖將第一個數據集中的每個條目與前12個月的回報合併。在SAS，我可以做這樣的事情：條件合併熊貓

proc sql; 
    create table temp_mf3 as 
    select a.*, b.mret from header_mf as a, 
    ret_mf as b where 
    a.crsp_fundno=b.crsp_fundno and 
    ((year(a.caldt)=year(b.caldt) and month(a.caldt)>month(b.caldt)) or 
    (year(a.caldt)=(year(b.caldt)+1) and month(a.caldt)<=month(b.caldt))); 
    quit;

在Python中，我試着crsp_fundno連接兩個數據幀而已，希望能在接下來的步驟中排除了超範圍的意見。但是，結果很快變得太大而無法處理，而且內存不足（我正在使用超過15年的數據）。

有沒有一種有效的方式來做這種熊貓條件下的條件合併？

來源

2014-02-16 vgregoire

對不起，如果這個答覆來得晚，以幫助。我不認爲你想要一個條件合併（至少如果我正確地理解情況）。我認爲只需合併['fundno','caldt']上的header_mf和ret_mf，然後使用熊貓中的shift運算符創建過去的回報列，就可以得到您想要的結果。

因此，我認爲你的數據基本如下所示：

import pandas as pd 
header = pd.read_csv('header.csv') 
print header 

    fundno  caldt foo 
0  1 1986-06-30 100 
1  1 1986-07-31 110 
2  1 1986-08-29 120 
3  1 1986-09-30 115 
4  1 1986-10-31 110 
5  1 1986-11-28 125 
6  1 1986-12-31 137 
7  2 1986-06-30 130 
8  2 1986-07-31 204 
9  2 1986-08-29 192 
10  2 1986-09-30 180 
11  2 1986-10-31 200 
12  2 1986-11-28 205 
13  2 1986-12-31 205 

ret_mf = pd.read_csv('ret_mf.csv') 
print ret_mf 

    fundno  caldt mret 
0  1 1986-06-30 0.05 
1  1 1986-07-31 0.01 
2  1 1986-08-29 -0.01 
3  1 1986-09-30 0.10 
4  1 1986-10-31 0.04 
5  1 1986-11-28 -0.02 
6  1 1986-12-31 -0.06 
7  2 1986-06-30 -0.04 
8  2 1986-07-31 0.03 
9  2 1986-08-29 0.07 
10  2 1986-09-30 0.00 
11  2 1986-10-31 -0.05 
12  2 1986-11-28 0.09 
13  2 1986-12-31 0.04

很明顯，頭文件中可能存在很多的變數（除了我由foo變量）。但是，如果這基本上捕獲數據的性質那麼我認爲你可以合併在['fundno','caldt']然後用shift：

mf = header.merge(ret_mf,how='left',on=['fundno','caldt']) 
print mf 

    fundno  caldt foo mret 
0  1 1986-06-30 100 0.05 
1  1 1986-07-31 110 0.01 
2  1 1986-08-29 120 -0.01 
3  1 1986-09-30 115 0.10 
4  1 1986-10-31 110 0.04 
5  1 1986-11-28 125 -0.02 
6  1 1986-12-31 137 -0.06 
7  2 1986-06-30 130 -0.04 
8  2 1986-07-31 204 0.03 
9  2 1986-08-29 192 0.07 
10  2 1986-09-30 180 0.00 
11  2 1986-10-31 200 -0.05 
12  2 1986-11-28 205 0.09 
13  2 1986-12-31 205 0.04

現在可以創建過去迴歸變量。因爲我創建了這樣一個小例子面板，我只會做3個月的過去回報：

for lag in range(1,4): 
    good = mf['fundno'] == mf['fundno'].shift(lag) 
    mf['ret' + str(lag)] = mf['mret'].shift(lag).where(good) 
print mf 

    fundno  caldt foo mret ret1 ret2 ret3 
0  1 1986-06-30 100 0.05 NaN NaN NaN 
1  1 1986-07-31 110 0.01 0.05 NaN NaN 
2  1 1986-08-29 120 -0.01 0.01 0.05 NaN 
3  1 1986-09-30 115 0.10 -0.01 0.01 0.05 
4  1 1986-10-31 110 0.04 0.10 -0.01 0.01 
5  1 1986-11-28 125 -0.02 0.04 0.10 -0.01 
6  1 1986-12-31 137 -0.06 -0.02 0.04 0.10 
7  2 1986-06-30 130 -0.04 NaN NaN NaN 
8  2 1986-07-31 204 0.03 -0.04 NaN NaN 
9  2 1986-08-29 192 0.07 0.03 -0.04 NaN 
10  2 1986-09-30 180 0.00 0.07 0.03 -0.04 
11  2 1986-10-31 200 -0.05 0.00 0.07 0.03 
12  2 1986-11-28 205 0.09 -0.05 0.00 0.07 
13  2 1986-12-31 205 0.04 0.09 -0.05 0.00

我很抱歉，如果我誤解了您的數據。

來源

2014-03-18 22:24:56

條件合併熊貓

回答

相關問題