2013-10-17 123 views
5

我有一個DataFrame的事件。一個或多個事件可能發生在一個日期(所以日期不能是一個索引)。日期範圍是幾年。我想分組數年和數月,並計算分類值。日Thnx熊貓羣比約會

in [12]: df = pd.read_excel('Pandas_Test.xls', 'sheet1') 
In [13]: df 
Out[13]: 
    EventRefNr  DateOccurence  Type Category 
0  86596 2010-01-02 00:00:00  3 Small 
1  86779 2010-01-09 00:00:00 13 Medium 
2  86780 2010-02-10 00:00:00  6 Small 
3  86781 2010-02-09 00:00:00 17 Small 
4  86898 2010-02-10 00:00:00  6 Small 
5  86898 2010-02-11 00:00:00  6 Small 
6  86902 2010-02-17 00:00:00  9 Small 
7  86908 2010-02-19 00:00:00  3 Medium 
8  86908 2010-03-05 00:00:00  3 Medium 
9  86909 2010-03-06 00:00:00  8 Small 
10  86930 2010-03-12 00:00:00 29 Small 
11  86934 2010-03-16 00:00:00  9 Small 
12  86940 2010-04-08 00:00:00  9  High 
13  86941 2010-04-09 00:00:00 17 Small 
14  86946 2010-04-14 00:00:00 10 Small 
15  86950 2011-01-19 00:00:00 12 Small 
16  86956 2011-01-24 00:00:00 13 Small 
17  86959 2011-01-27 00:00:00 17 Small 

我想:

df.groupby(df['DateOccurence']) 
+0

你可以顯示你已經嘗試過的代碼嗎? – Jeff

回答

4

您可以申請value_counts到SeriesGroupby(對於列):

In [11]: g = df.groupby('DateOccurence') 

In [12]: g.Category.apply(pd.value_counts) 
Out[12]: 
DateOccurence   
2010-01-02  Small  1 
2010-01-09  Medium 1 
2010-02-09  Small  1 
2010-02-10  Small  2 
2010-02-11  Small  1 
2010-02-17  Small  1 
2010-02-19  Medium 1 
2010-03-05  Medium 1 
2010-03-06  Small  1 
2010-03-12  Small  1 
2010-03-16  Small  1 
2010-04-08  High  1 
2010-04-09  Small  1 
2010-04-14  Small  1 
2011-01-19  Small  1 
2011-01-24  Small  1 
2011-01-27  Small  1 
dtype: int64 

其實我希望這個返回以下數據幀,但你需要unstack

In [13]: g.Category.apply(pd.value_counts).unstack(-1).fillna(0) 
Out[13]: 
       High Medium Small 
DateOccurence      
2010-01-02  0  0  1 
2010-01-09  0  1  0 
2010-02-09  0  0  1 
2010-02-10  0  0  2 
2010-02-11  0  0  1 
2010-02-17  0  0  1 
2010-02-19  0  1  0 
2010-03-05  0  1  0 
2010-03-06  0  0  1 
2010-03-12  0  0  1 
2010-03-16  0  0  1 
2010-04-08  1  0  0 
2010-04-09  0  0  1 
2010-04-14  0  0  1 
2011-01-19  0  0  1 
2011-01-24  0  0  1 
2011-01-27  0  0  1 

如果有與他們在同一行的相同日期的多個不同類別...

+0

好極了,現在如何按月分組? – ArtDijk

+0

@ArtDijk我認爲這裏的技巧是使用DatetimeIndex,'di = pd.DatetimeIndex(df.DateOccurence); g = df.groupby([di.month,di.year])' –

6

對於年份和月份休息了,我經常添加更多的列到數據幀打出來的日期爲每片:

df['year'] = [t.year for t in df.DateOccurence] 
df['month'] = [t.month for t in df.DateOccurence] 
df['day'] = [t.day for t in df.DateOccurence] 

它增加了空間複雜度(添加列到DF),但更短的時間複雜(上GROUPBY更少的處理)比日期時間指數,但它確實給你。日期時間索引是更多熊貓做事的方式。

在年,月,日爆發之後,您可以根據需要做任何分組。

df.groupby['year','month'].Category.apply(pd.value_counts) 

要獲得個月跨越多個年:

df.groupby['month'].Category.apply(pd.value_counts) 

或者安迪·海登的日期時間指數

df.groupby[di.month].Category.apply(pd.value_counts) 

你可以簡單地選擇哪些方法適合您的需求更好。