0
我的數據幀(DF)看起來像這樣如何獲取密鑰的重複計數與大熊貓等聚集一起
Customer_number Store_number year month last_buying_date1 amount
1 20 2014 10 2015-10-07 100
1 20 2014 10 2015-10-09 200
2 20 2014 10 2015-10-20 100
2 10 2014 10 2015-10-13 500
,我希望得到一個輸出這樣
year month sum_purchase count_purchases distinct customers
2014 10 900 4 3
怎麼辦我使用Agg和group by得到了這樣的輸出。目前我正在使用2步組,但努力爭取不同客戶。這是我的方法
#### Step 1 - Aggregating everything at customer_number, store_number level
aggregations = {
'amount': 'sum',
'last_buying_date1': 'count',
}
grouped_at_Cust = DF.groupby(['customer_number','store_number','month','year']).agg(aggregations).reset_index()
grouped_at_Cust.columns = ['customer_number','store_number','month','year','total_purchase','num_purchase']
#### Step2 - Aggregating at year month level
aggregations = {
'total_purchase': 'sum',
'num_purchase': 'sum',
size
}
Monthly_customers = grouped_at_Cust.groupby(['year','month']).agg(aggregations).reset_index()
Monthly_customers.colums = ['year','month','sum_purchase','count_purchase','distinct_customers']
我的鬥爭是在第二步。我如何在第二個聚合步驟中包含大小?
感謝@Nickil。但我的客戶被定義爲customer_number和store_number的組合。如何將他們結合起來做nunique? – sourav
是否將'purchase_amt'總和/計數計算爲不使用'store_number'作爲分組對象之一?如果是這種情況,你需要爲不同的選擇做兩次「groupby」。 *請參閱編輯* –
請參閱更新示例(編輯問題)。客戶不僅僅是customer_number,而是customer_number和store_number的組合。所以,如果我可以連接customer_number和商店編號,並且使用'nunique'實現你的解決方案,那麼這將起作用。但是concat會導致其他問題。 – sourav