2017-02-28 54 views
1

我能夠做出的訂單如表大熊貓的幫助: enter image description hereTimedeltas忠誠計算

identifier gender  Date category 
0   1 female 2016-11-11  Baby 
1   1 female 2017-02-01  Baby 
2   2 female 2016-12-19 Shave 
3   2 female 2016-12-27 Shave 
4   3 female 2016-11-11  Baby 
5   3 female 2016-11-22  Baby 
6   4 male 2016-11-11 Shave 
7   4 male 2017-01-01 Shave 

我需要的結果是按天數第一第二的訂單的訂單數量和:

first order: 
11.11.2016 3 
19.12.2016 1 

second orders: 
22.11.2016 1 
21.12.2016 1 
01.01.2017 1 
02.01.2017 1 

third orders: 

,也是我需要計算訂單之間的平均時間(被人)

average time between orders = ... 

並評估客戶的跨品類忠誠度。我覺得這些taska看起來很相似

Loyalty cross categories: 
    first order: 
    Baby 2 
    second order: 
    Baby - 2 
    third order: 


    first order: 
    Shave 2 
    second order: 
    Shave - 2 
    third order: 

是否可以用熊貓做這樣的分析?

回答

1

鑑於此數據幀

identifier gender  Date category 
0   1 female 2016-11-11  Baby 
1   1 female 2017-02-01  Baby 
2   2 female 2016-12-19 Shave 
3   2 female 2016-12-27 Shave 
4   3 female 2016-11-11  Baby 
5   3 female 2016-11-22  Baby 
6   4 male 2016-11-11 Shave 
7   4 male 2017-01-01 Shave 

您可以通過一組函數使用一系列偏移開始

df_groups = df.groupby('identifier') 
df['last_order'] = df_groups.Date.shift(1) 

然後你就可以拿到訂單

df['Time_between_orders'] = df['last_order'] - df['Date'] 

然後之間的時間你可以得到這樣的每個用戶之間的平均時間:

df_groups = df.groupby('identifier') 
df_groups['Time_between_orders'].apply(lambda x: x.sum()/x.notnull().sum()).apply(lambda x: x.days) 

會給:

identifier 
1   -82 
2   -8 
3   -11 
4   -51 

如果你想要這個跨類別,只需添加類別到全部組語句。 df.groupby('identifier')變爲df.groupby(['identifier', 'category'])