2017-10-16 41 views
1

想要將df.Month爆炸爲周並將數量平均分配到周。周從星期一開始。將月份的數量平均分配到周(2)

DF

Country   Item  Month   Qty 
    ------------------------------------------- 
0 New Zealand  Apple  2017-10-31  100 
1 Puerto Rico  Banana  2017-11-30  200 
2 France   Apple  2017-10-31  400 
... 

期望的輸出是:

Country  Item  Week   Qty 
    ------------------------------------------- 
0 New Zealand Apple  2017-10-01  20 
1 New Zealand Apple  2017-10-08  20 
2 New Zealand Apple  2017-10-15  20 
3 New Zealand Apple  2017-10-22  20 
4 New Zealand Apple  2017-10-29  20 
5 Puerto Rico Banana  2017-11-05  50 
6 Puerto Rico Banana  2017-11-12  50 
7 Puerto Rico Banana  2017-11-19  50 
8 Puerto Rico Banana  2017-11-26  50 
9 France  Apple  2017-10-01  80 
10 France  Apple  2017-10-08  80 

...

使用專爲週數據幀: mondays = pd.Series(pd.date_range(first_day, last_day, freq='W-Mon')) weeks = pd.DataFrame({'Week':mondays})

2)周

Week 
    ---------- 
0 2017-10-01 
1 2017-10-08 
2 2017-10-15 
3 2017-10-22 
4 2017-10-29 
5 2017-11-05 
6 2017-11-12 
7 2017-11-19 
8 2017-11-26 

... 

這是對問題的擴展:Distribute month's quantity equally into weeks

+0

'DF1 = df1.drop_duplicates( '月')'與之前的解決方案不起作用? – jezrael

回答

1

您可以使用:

mondays = pd.Series(pd.date_range('2017-10-01', '2017-11-26 ', freq='W-Mon')) 
weeks = pd.DataFrame({'Week':mondays}) 

#month period for merge 
df['Month'] = pd.to_datetime(df['Month']).dt.to_period('m') 
weeks['Week'] = pd.to_datetime(weeks['Week']) 
#month period for merge 
weeks['Month'] = weeks['Week'].dt.to_period('m') 

#merge by Month 
df = pd.merge(df, weeks, on='Month') 
#divide by map by Series created by count 
df['Qty'] = df['Qty'].div(df['Month'].map(weeks['Month'].value_counts())) 
df = df.drop('Month', 1) 
print (df) 
     Country Item  Qty  Week 
0 New Zealand Apple 20.000000 2017-10-02 
1 New Zealand Apple 20.000000 2017-10-09 
2 New Zealand Apple 20.000000 2017-10-16 
3 New Zealand Apple 20.000000 2017-10-23 
4 New Zealand Apple 20.000000 2017-10-30 
5  France Apple 80.000000 2017-10-02 
6  France Apple 80.000000 2017-10-09 
7  France Apple 80.000000 2017-10-16 
8  France Apple 80.000000 2017-10-23 
9  France Apple 80.000000 2017-10-30 
10 Puerto Rico Banana 66.666667 2017-11-06 
11 Puerto Rico Banana 66.666667 2017-11-13 
12 Puerto Rico Banana 66.666667 2017-11-20 
+0

謝謝@jezrael。第二種解決方案可以工作,因爲它保留了重複項目。 「Item」後面有更多的字段 - 像_Country,Region_等。在第二種解決方案中,我如何顯示所有字段? 'print(df1)'只顯示_Month_和_Qty_ – reservoirinvest

+0

你可以修改問題嗎?但需要聚合所有列,否則列被忽略...所以需要'df1 = df.groupby('Month',as_index = False).agg({'Qty':sum,'Country':'first','Region ':'_'。join})'。我沒有數據,只展示了可能的數據列聚合。 – jezrael

+0

我認爲聚合不起作用。我期待着幾個月到幾周的爆發。不聚合。 – reservoirinvest