2017-05-08 38 views
3

我正在按多列對數據框進行分組並聚合以獲取多個統計信息。如何獲得一個完全平坦的結構,每個可能的組合鍵都被枚舉爲行並且每個統計量都以列的形式出現?從分組和多個聚合平分分層索引pandas.DataFrame

import numpy as np 
import pandas as pd 

cities = ['Berlin', 'Oslo'] 
days = ['Monday', 'Friday'] 

data = pd.DataFrame({ 
     'city': np.random.choice(cities, 12), 
     'day': np.random.choice(days, 12), 
     'people': np.random.normal(loc=10, size=12), 
     'cats': np.random.normal(loc=6, size=12)}) 
grouped = data.groupby(['city', 'day']).agg([np.mean, np.std]) 

這樣我得到:我需要得到它平

    cats    people   
        mean  std  mean  std 
city day            
Berlin Friday 6.146924 0.721263 10.445606 0.730992 
     Monday 5.239267  NaN 9.022811  NaN 
Oslo Friday 6.322276 0.866899 11.579813 0.114341 
     Monday 5.028919 0.815674 10.458439 1.182689 

city day  cats_mean cats_std people_mean people_std          
Berlin Friday 6.146924 0.721263 10.445606 0.730992 
Berlin Monday 5.239267  NaN 9.022811  NaN 
Oslo Friday 6.322276 0.866899 11.579813 0.114341 
Oslo Monday 5.028919 0.815674 10.458439 1.182689 
+0

您可以直接調用'grouped.reset_index()'恢復索引返回列 – EdChum

+0

@EdChum它仍然留下一個MultiIndex,使得難以操縱彙總的統計信息作爲列 –

+0

指定'grouped = grouped.reset_index()',確定你想要展平列也 – EdChum

回答

5
In [36]: grouped.columns = grouped.columns.map('_'.join) 

In [37]: grouped = grouped.reset_index() 

In [38]: grouped 
Out[38]: 
    city  day cats_mean cats_std people_mean people_std 
0 Berlin Friday 5.852991 1.085163 11.078541 0.839688 
1 Berlin Monday 6.978343 0.630983  9.876106 1.846204 
2 Oslo Friday 6.096773 1.278176  9.710216 0.691672 
+0

這比干淨我的解決方案+1 – EdChum

+0

@EdChum,謝謝! – MaxU

+0

@ScottBoston,謝謝,感謝您的評論! – MaxU

2

您可以在列級別進行列表理解,並用下劃線加盟然後致電reset_index

In [39]:  
grouped.columns= ['_'.join(x) for x in list(zip(grouped.columns.get_level_values(0), grouped.columns.get_level_values(1)))] 
grouped = grouped.reset_index() 
grouped 

Out[39]: 
    city  day cats_mean cats_std people_mean people_std 
0 Berlin Friday 6.140710 0.555981 10.187634 0.359724 
1 Berlin Monday 6.420175 0.986568 10.134376 0.963938 
2 Oslo Friday 6.978572 0.573297 11.345484 1.454762 
3 Oslo Monday 4.594814  NaN 10.842988   NaN 
1

您可以用詞典.agg重命名列然後刪除列級和reset_index(): 看到這個SO Post

import numpy as np 
import pandas as pd 

cities = ['Berlin', 'Oslo'] 
days = ['Monday', 'Friday'] 

data = pd.DataFrame({ 
     'city': np.random.choice(cities, 12), 
     'day': np.random.choice(days, 12), 
     'people': np.random.normal(loc=10, size=12), 
     'cats': np.random.normal(loc=6, size=12)}) 
grouped = data.groupby(['city', 'day']).agg({'cats':{'cats_mean':np.mean,'cats_std':np.std},'people':{'people_mean':np.mean,'people_std':np.std}}) 

grouped.columns = grouped.columns.droplevel() 
grouped.reset_index() 

    city  day people_mean people_std cats_std cats_mean 
0 Berlin Friday  9.645190 0.699684 0.973866 6.478510 
1 Berlin Monday  9.556898 0.126810 0.336654 6.624288 
2 Oslo Friday 11.593491   NaN  NaN 6.206595 
3 Oslo Monday 10.202183 1.058651 0.657939 6.019748 
+1

請注意'FutureWarning:使用帶重命名的字典已過時,將在未來版本中刪除'。這是[鏈接](http://pandas.pydata.org/pandas-docs/version/0.20/whatsnew.html#deprecate-groupby-agg-with-a-dictionary-when-renaming) – MaxU