Pandas合併列之間的日期

我有兩個數據框 - 一個客戶的呼叫和另一個識別活動的服務持續時間。每個客戶可以有多個服務，但不會重疊。Pandas合併列之間的日期

df_calls = pd.DataFrame([['A','2016-02-03',1],['A','2016-05-11',2],['A','2016-10-01',3],['A','2016-11-02',4], 
         ['B','2016-01-10',5],['B','2016-04-25',6]], columns = ['cust_id','call_date','call_id']) 

print df_calls 

    cust_id call_date call_id 
0  A 2016-02-03  1 
1  A 2016-05-11  2 
2  A 2016-10-01  3 
3  A 2016-11-02  4 
4  B 2016-01-10  5 
5  B 2016-04-25  6

和

df_active = pd.DataFrame([['A','2016-01-10','2016-03-15',1],['A','2016-09-10','2016-11-15',2], 
          ['B','2016-01-02','2016-03-17',3]], columns = ['cust_id','service_start','service_end','service_id']) 


print df_active 

    cust_id service_start service_end service_id 
0  A 2016-01-10 2016-03-15   1 
1  A 2016-09-10 2016-11-15   2 
2  B 2016-01-02 2016-03-17   3

我需要找到每個調用屬於由SERVICE_START和service_end日期標識的的service_id。如果呼叫不在日期之間，則應保留在數據集中。

這裏是我試過到目前爲止：

df_test_output = pd.merge(df_calls,df_active, how = 'left',on = ['cust_id']) 
df_test_output = df_test_output[(df_test_output['call_date']>= df_test_output['service_start']) 
         & (df_test_output['call_date']<= df_test_output['service_end'])].drop(['service_start','service_end'],axis = 1) 

print df_test_output 

    cust_id call_date call_id service_id 
0  A 2016-02-03  1   1 
5  A 2016-10-01  3   2 
7  A 2016-11-02  4   2 
8  B 2016-01-10  5   3

這種下降是沒有服務日期之間的所有呼叫。關於如何在滿足條件的service_id上合併，但保留其餘記錄的想法？

結果應該是這樣的：

#do black magic 

print df_calls 

cust_id call_date call_id service_id 
0  A 2016-02-03  1   1.0 
1  A 2016-05-11  2   NaN 
2  A 2016-10-01  3   2.0 
3  A 2016-11-02  4   2.0 
4  B 2016-01-10  5   3.0 
5  B 2016-04-25  6   NaN

來源

2016-11-16 flyingmeatball

您可以加入'df_calls2'用'df_calls'上'call_id' –

您可以使用merge與左連接：

print (pd.merge(df_calls, df_calls2, how='left')) 
    cust_id call_date call_id service_id 
0  A 2016-02-03  1   1.0 
1  A 2016-05-11  2   NaN 
2  A 2016-10-01  3   2.0 
3  A 2016-11-02  4   2.0 
4  B 2016-01-10  5   3.0 
5  B 2016-04-25  6   NaN

來源

2016-11-16 15:24:18 jezrael

df_calls2 ISN」真正的桌子。這是合併df_calls和df_service然後刪除愚蠢的輸出。它的創建表明我嘗試的方法不起作用。 – flyingmeatball

嗯，你認爲它可行，但找到更好的解決方案？ – jezrael

啊gotcha - 我看到你在說什麼，那是行得通的，謝謝！我一直在探索使用圖https://docs.scipy.org/doc/scipy-0.18.1/reference/generated/scipy.sparse.csgraph.connected_components.html – flyingmeatball

Pandas合併列之間的日期

回答

相關問題