2017-10-20 21 views
0

有無2個dataframes:獲得從2個dataframes與條件分組數據

 
print df1 

userid reg_date 
1  2015-07-21 
2  2015-07-11 
3  2015-07-14 

print df2 

userid   date    status  amount 
1    2015-07-22   CHARGED  11.68 
1    2015-07-29   CHARGED  21.4 
2    2015-07-13   CHARGED  18.98 
2    2015-07-15   DECLINED  10.96 

需要來自DF1查找總和(量)DF2每用戶ID,其中狀態= 「帶電」 和reg_date + 7>日期

 
# result 
userid amount 
1  11.68 
2  18.98 
3  0 

我以這種方式構建解決方案。 但是如果沒有滿足df2條件的行,就不會返回UserId(需要返回0)。


    import pandas as pd 
    from datetime import timedelta 
    df1 = pd.read_csv('Task2_data1.csv', sep=',',parse_dates=['reg_date']) 
    df2 = pd.read_csv('Task2_data2.csv', sep=',',parse_dates=['date']) 
    df2['amount'] = df2['amount'].replace(',','.', regex=True).astype(float) 
    df3 = pd.merge(df1, df2, how='outer', on=['userid', 'userid']) 
    df3 = df3[(df3.status == 'CHARGED') & 
       (df3.reg_date + timedelta(days=7)>df3.date)] 
    print df3.groupby(['userid'])['amount'].sum() 

有沒有其他辦法可以做到這一點?

回答

1

使用

In [4974]: dff = df2.merge(df1) 

In [4975]: (dff[dff['status'].eq('CHARGED') & (dff['date']-dff['reg_date']).dt.days.le(7)] 
       .groupby('userid')['amount'].sum() 
       .reindex(df1['userid'].unique(), fill_value=0) 
       .reset_index()) 
Out[4975]: 
    userid amount 
0  1 11.68 
1  2 18.98 
2  3 0.00