-1
我無法以更pythonic和高效的方式編寫此代碼。我試圖按customerid對觀察進行分組,並計算在過去1,7天和30天內客戶被拒絕的每次觀察的次數。計算組中過去x天的值
t = pd.DataFrame({'customerid': [1,1,1,3,3],
'leadid': [10,11,12,13,14],
'postdate': ["2017-01-25 10:55:25.727", "2017-02-02 10:55:25.727", "2017-02-27 10:55:25.727", "2017-01-25 10:55:25.727", "2017-01-25 11:55:25.727"],
'post_status': ['Declined', 'Declined', 'Declined', 'Declined', 'Declined']})
t['postdate'] = pd.to_datetime(t['postdate'])
這裏是輸出:
customerid leadid post_status postdate
1 10 Declined 2017-01-25 10:55:25.727
1 11 Declined 2017-02-02 10:55:25.727
1 12 Declined 2017-02-27 10:55:25.727
3 13 Declined 2017-01-25 10:55:25.727
3 14 Declined 2017-01-25 11:55:25.727
我目前的解決方案是非常緩慢:
final = []
for customer in t['customerid'].unique():
temp = t[(t['customerid']==customer) & (t['post_status']=='Declined')].copy()
for i, row in temp.iterrows():
date = row['postdate']
final.append({
'leadid': row['leadid'],
'decline_1': temp[(temp['postdate'] <= date) & (temp['postdate']>=date-timedelta(days=1))].shape[0]-1,
'decline_7': temp[(temp['postdate'] <= date) & (temp['postdate']>=date-timedelta(days=7))].shape[0]-1,
'decline_30': temp[(temp['postdate'] <= date) & (temp['postdate']>=date-timedelta(days=30))].shape[0]-1
})
預期的輸出如下所示:
decline_1 decline_30 decline_7 leadid
0 0 0 10
0 1 0 11
0 1 0 12
0 0 0 13
1 1 1 14
我想象我需要某種double groupby在哪裏我遍歷組中的每一行,但除了這個需要很長時間才能完成的double for循環外,我無法獲得任何工作。
任何幫助,將不勝感激。
你真是個該死的天才!謝謝! – fcol