2016-07-01 156 views
2

我想做到以下幾點:根據各行大熊貓的GroupBy日期範圍

的數據幀,看起來像這樣:

df = pd.DataFrame({"ID":["A", "A", "C" ,"B", "B"], "date":["06/24/2014","06/25/2014","06/23/2014","07/02/1999","07/02/1999"], "value": ["3","5","1","7","8"] }) 

我想按日期,所有那些意見組在兩天之內。然後,例如,前3行將被分組,最後兩行將被分組。

到目前爲止,我已經想使用類似的東西:

df.groupby(df['date'].map(lambda x: x.month)) 

什麼是做這類「模糊GROUPBY」的一般方法是什麼?

謝謝你,

+0

http://stackoverflow.com/questions/22769047/pandas-group-by-time-windows的可能的複製 – Jeff

回答

4

您可以通過date行進行排序,然後採取連續日期之間的差值。 測試差異是否大於2天。以累積和分配所希望的組號:

import pandas as pd 
df = pd.DataFrame({"ID":["A", "A", "C" ,"B", "B"], "date":["06/24/2014","06/25/2014","06/23/2014","07/02/1999","07/02/1999"], "value": ["3","5","1","7","8"] }) 
df['date'] = pd.to_datetime(df['date']) 
df = df.sort_values(by='date') 
df['group'] = (df['date'].diff() > pd.Timedelta(days=2)).cumsum() 
print(df) 

產生

ID  date value group 
3 B 1999-07-02  7  0 
4 B 1999-07-02  8  0 
2 C 2014-06-23  1  1 
0 A 2014-06-24  3  1 
1 A 2014-06-25  5  1