2017-02-15 48 views
3

我想排序數據(大熊貓)分組和聚合後,我卡住了。我的數據:Python大熊貓排序後groupby和聚合

data = {'from_year': [2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012], 
    'name': ['John', 'John1', 'John', 'John', 'John4', 'John', 'John1', 'John6'], 
    'out_days': [11, 8, 10, 15, 11, 6, 10, 4]} 
persons = pd.DataFrame(data, columns=["from_year", "name", "out_days"]) 

days_off_yearly = persons.groupby(["from_year", "name"]).agg({"out_days": [np.sum]}) 

print(days_off_yearly) 

之後,我已經整理我的數據:

   out_days 
        sum 
from_year name   
2010  John  17 
2011  John  15 
      John1  18 
2012  John  10 
      John4  11 
      John6  4 

我想通過FROM_YEAR和out_days總和與預期的數據我的數據排序是:

   out_days 
        sum 
from_year name   
2012  John4  11 
      John  10 
      John6  4  
2011  John1  18 
      John  15 
2010  John  17 

我我正在嘗試

print(days_off_yearly.sort_values(["from_year", ("out_days", "sum")], ascending=False).head(10)) 

但是得到KeyError:'from_year'。

任何幫助表示讚賞。

回答

5

您可以使用sort_values,但首先reset_index然後​​:

#simplier aggregation 
days_off_yearly = persons.groupby(["from_year", "name"])['out_days'].sum() 
print(days_off_yearly) 
from_year name 
2010  John  17 
2011  John  15 
      John1 18 
2012  John  10 
      John4 11 
      John6  4 
Name: out_days, dtype: int64 

print (days_off_yearly.reset_index() 
         .sort_values(['from_year','out_days'],ascending=False) 
         .set_index(['from_year','name'])) 
       out_days 
from_year name   
2012  John4  11 
      John   10 
      John6   4 
2011  John1  18 
      John   15 
2010  John   17