2016-11-16 74 views
3

我有兩個數據框 - 一個客戶的呼叫和另一個識別活動的服務持續時間。每個客戶可以有多個服務,但不會重疊。Pandas合併列之間的日期

df_calls = pd.DataFrame([['A','2016-02-03',1],['A','2016-05-11',2],['A','2016-10-01',3],['A','2016-11-02',4], 
         ['B','2016-01-10',5],['B','2016-04-25',6]], columns = ['cust_id','call_date','call_id']) 

print df_calls 

    cust_id call_date call_id 
0  A 2016-02-03  1 
1  A 2016-05-11  2 
2  A 2016-10-01  3 
3  A 2016-11-02  4 
4  B 2016-01-10  5 
5  B 2016-04-25  6 

df_active = pd.DataFrame([['A','2016-01-10','2016-03-15',1],['A','2016-09-10','2016-11-15',2], 
          ['B','2016-01-02','2016-03-17',3]], columns = ['cust_id','service_start','service_end','service_id']) 


print df_active 

    cust_id service_start service_end service_id 
0  A 2016-01-10 2016-03-15   1 
1  A 2016-09-10 2016-11-15   2 
2  B 2016-01-02 2016-03-17   3 

我需要找到每個調用屬於由SERVICE_START和service_end日期標識的的service_id。如果呼叫不在日期之間,則應保留在數據集中。

這裏是我試過到目前爲止:

df_test_output = pd.merge(df_calls,df_active, how = 'left',on = ['cust_id']) 
df_test_output = df_test_output[(df_test_output['call_date']>= df_test_output['service_start']) 
         & (df_test_output['call_date']<= df_test_output['service_end'])].drop(['service_start','service_end'],axis = 1) 

print df_test_output 

    cust_id call_date call_id service_id 
0  A 2016-02-03  1   1 
5  A 2016-10-01  3   2 
7  A 2016-11-02  4   2 
8  B 2016-01-10  5   3 

這種下降是沒有服務日期之間的所有呼叫。關於如何在滿足條件的service_id上​​合併,但保留其餘記錄的想法?

結果應該是這樣的:

#do black magic 

print df_calls 

cust_id call_date call_id service_id 
0  A 2016-02-03  1   1.0 
1  A 2016-05-11  2   NaN 
2  A 2016-10-01  3   2.0 
3  A 2016-11-02  4   2.0 
4  B 2016-01-10  5   3.0 
5  B 2016-04-25  6   NaN 
+1

您可以加入'df_calls2'用'df_calls'上'call_id' –

回答

3

您可以使用merge與左連接:

print (pd.merge(df_calls, df_calls2, how='left')) 
    cust_id call_date call_id service_id 
0  A 2016-02-03  1   1.0 
1  A 2016-05-11  2   NaN 
2  A 2016-10-01  3   2.0 
3  A 2016-11-02  4   2.0 
4  B 2016-01-10  5   3.0 
5  B 2016-04-25  6   NaN 
+0

df_calls2 ISN」真正的桌子。這是合併df_calls和df_service然後刪除愚蠢的輸出。它的創建表明我嘗試的方法不起作用。 – flyingmeatball

+0

嗯,你認爲它可行,但找到更好的解決方案? – jezrael

+0

啊gotcha - 我看到你在說什麼,那是行得通的,謝謝!我一直在探索使用圖https://docs.scipy.org/doc/scipy-0.18.1/reference/generated/scipy.sparse.csgraph.connected_components.html – flyingmeatball