2017-10-16 45 views
0

比方說,我的數據框看起來是這樣的:熊貓 - 累計值轉換爲實際值

date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count 
2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,53.0 
2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,53.0 
2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,53.0 
2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,54.0 
2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,54.0 
2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,54.0 
2017-03-26,website1,US,0,84,228,0.0,16.0,3.369048,54.0 
2017-02-15,website2,AU,1,91,144,4.0,148.0,4.727272,521.0 
2017-02-16,website2,AU,1,91,144,3.0,147.0,4.727272,524.0 
2017-02-17,website2,AU,1,91,100,4.0,148.0,4.727272,524.0 
2017-02-18,website2,AU,1,91,118,6.0,149.0,4.727272,527.0 
2017-02-19,website2,AU,1,91,114,4.0,151.0,4.727272,529.0 

在最後的count列是累計計數。 我需要做的是找到特定 日期+網站+國家+實物+ ID元組的實際計數,這將導致:

date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count 
2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,0.0 
2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,0.0 
2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,0.0 
2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,1.0 
2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,0.0 
2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,0.0 
2017-03-26,website1,US,0,84,228,0.0,16.0,3.369048,0.0 
2017-02-15,website2,AU,1,91,144,4.0,148.0,4.727272,0.0 
2017-02-16,website2,AU,1,91,144,3.0,147.0,4.727272,3.0 
2017-02-17,website2,AU,1,91,100,4.0,148.0,4.727272,0.0 
2017-02-18,website2,AU,1,91,118,6.0,149.0,4.727272,3.0 
2017-02-19,website2,AU,1,91,114,4.0,151.0,4.727272,2.0 

我知道這將涉及groupby呼叫,但我沒有除了這個之外還有什麼想法。假設元組的第一個實例的計數爲0。 任何幫助將令人敬畏。謝謝

回答

2

使用groupby + diffcumsum的倒數。

cols = ['site', 'country_code', 'kind', 'ID'] 
df['count'] = df.groupby(cols)['count'].diff().fillna(0) 

print(df['count']) 
0  0.0 
1  0.0 
2  0.0 
3  1.0 
4  0.0 
5  0.0 
6  0.0 
7  0.0 
8  3.0 
9  0.0 
10 3.0 
11 2.0 
Name: count, dtype: float64 

感謝MaxU指出錯誤!

+0

謝謝但是這將導致元組''(2017-02-15,website2,AU,1,91)'的值爲'467',而它應該是0 – Craig

+1

我認爲OP想要的東西是:'df .groupby('site')['count']。diff()。fillna(0)' – MaxU

+0

@MaxU非常感謝!我誤解了這個問題。 –