熊貓 - 累計值轉換爲實際值

比方說，我的數據框看起來是這樣的：熊貓 - 累計值轉換爲實際值

date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count 
2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,53.0 
2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,53.0 
2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,53.0 
2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,54.0 
2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,54.0 
2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,54.0 
2017-03-26,website1,US,0,84,228,0.0,16.0,3.369048,54.0 
2017-02-15,website2,AU,1,91,144,4.0,148.0,4.727272,521.0 
2017-02-16,website2,AU,1,91,144,3.0,147.0,4.727272,524.0 
2017-02-17,website2,AU,1,91,100,4.0,148.0,4.727272,524.0 
2017-02-18,website2,AU,1,91,118,6.0,149.0,4.727272,527.0 
2017-02-19,website2,AU,1,91,114,4.0,151.0,4.727272,529.0

在最後的count列是累計計數。我需要做的是找到特定日期+網站+國家+實物+ ID元組的實際計數，這將導致：

date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count 
2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,0.0 
2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,0.0 
2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,0.0 
2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,1.0 
2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,0.0 
2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,0.0 
2017-03-26,website1,US,0,84,228,0.0,16.0,3.369048,0.0 
2017-02-15,website2,AU,1,91,144,4.0,148.0,4.727272,0.0 
2017-02-16,website2,AU,1,91,144,3.0,147.0,4.727272,3.0 
2017-02-17,website2,AU,1,91,100,4.0,148.0,4.727272,0.0 
2017-02-18,website2,AU,1,91,118,6.0,149.0,4.727272,3.0 
2017-02-19,website2,AU,1,91,114,4.0,151.0,4.727272,2.0

我知道這將涉及groupby呼叫，但我沒有除了這個之外還有什麼想法。假設元組的第一個實例的計數爲0。任何幫助將令人敬畏。謝謝

來源

2017-10-16 Craig

使用groupby + diff，cumsum的倒數。

cols = ['site', 'country_code', 'kind', 'ID'] 
df['count'] = df.groupby(cols)['count'].diff().fillna(0) 

print(df['count']) 
0  0.0 
1  0.0 
2  0.0 
3  1.0 
4  0.0 
5  0.0 
6  0.0 
7  0.0 
8  3.0 
9  0.0 
10 3.0 
11 2.0 
Name: count, dtype: float64

感謝MaxU指出錯誤！

來源

2017-10-16 20:28:43

謝謝但是這將導致元組''（2017-02-15，website2，AU，1,91）'的值爲'467'，而它應該是0 – Craig

我認爲OP想要的東西是：'df .groupby（'site'）['count']。diff（）。fillna（0）' – MaxU

@MaxU非常感謝！我誤解了這個問題。 –

熊貓 - 累計值轉換爲實際值

回答

相關問題