2016-04-17 74 views
1

我有一個多索引(面板)的數據框,我想分爲每個組(county)和每一行,值按特定年份。熊貓:按行分割多元索引數據框

>>> fields 
Out[39]: ['emplvl', 'population', 'estab', 'estab_pop', 'emp_pop'] 
>>> df[fields] 
Out[40]: 
        emplvl population estab estab_pop emp_pop 
county year               
1001 2003 11134.500000  46800 801.75 0.017131 0.237917 
     2004 11209.166667  48366 824.00 0.017037 0.231757 
     2005 11452.166667  49676 870.75 0.017529 0.230537 
     2006 11259.250000  51328 862.50 0.016804 0.219359 
     2007 11403.333333  52405 879.25 0.016778 0.217600 
     2008 11272.833333  53277 890.25 0.016710 0.211589 
     2009 11003.833333  54135 877.00 0.016200 0.203267 
     2010 10693.916667  54632 877.00 0.016053 0.195745 
     2011 10627.000000   NaN 862.00  NaN  NaN 
     2012 10136.916667   NaN 841.75  NaN  NaN 
1003 2003 51372.250000  151509 4272.00 0.028196 0.339071 
     2004 53450.583333  156266 4536.25 0.029029 0.342049 
     2005 56110.250000  162183 4880.50 0.030093 0.345969 
     2006 59291.000000  168121 5067.50 0.030142 0.352669 
     2007 62600.083333  172404 5337.25 0.030958 0.363101 
     2008 62611.500000  175827 5529.25 0.031447 0.356097 
     2009 58947.666667  179406 5273.75 0.029396 0.328571 
     2010 58139.583333  183195 5171.25 0.028228 0.317364 
     2011 59581.000000   NaN 5157.75  NaN  NaN 
     2012 60440.250000   NaN 5171.75  NaN  NaN 

該行通過

>>> df[fields].loc[df.index.get_level_values('year') == 2007, fields] 
Out[32]: 
        emplvl population estab estab_pop emp_pop 
county year               
1001 2007 11403.333333  52405 879.25 0.016778 0.217600 
1003 2007 62600.083333  172404 5337.25 0.030958 0.363101 

不過來劃分,既

df[fields].div(df.loc[df.index.get_level_values('year') == 2007, fields], axis=0) 
df[fields].div(df.loc[df.index.get_level_values('year') == 2007, fields], axis=1) 

給我一個數據幀充滿NaN,可能是因爲pandas是企圖分裂服用year -index考慮到並且沒有發現任何分裂。

爲了彌補這一點,我也試過

df[fields].div(df.loc[df.index.get_level_values('year') == 2007, fields].values) 

這給了我ValueError: Shape of passed values is (5, 2), indices imply (5, 20)

回答

3

我認爲你可以df1reset_index然後用div

fields = ['emplvl', 'population', 'estab', 'estab_pop', 'emp_pop'] 

df1 = df.loc[df.index.get_level_values('year') == 2007, fields].reset_index(level=1) 
print df1 
     year  emplvl population estab estab_pop emp_pop 
county                
1001 2007 11403.333333  52405.0 879.25 0.016778 0.217600 
1003 2007 62600.083333 172404.0 5337.25 0.030958 0.363101 

print df.div(df1[fields], axis=0) 
       emplvl population  estab estab_pop emp_pop 
county year              
1001 2003 0.976425 0.893045 0.911857 1.021039 1.093369 
     2004 0.982973 0.922927 0.937162 1.015437 1.065060 
     2005 1.004282 0.947925 0.990333 1.044761 1.059453 
     2006 0.987365 0.979449 0.980950 1.001550 1.008084 
     2007 1.000000 1.000000 1.000000 1.000000 1.000000 
     2008 0.988556 1.016640 1.012511 0.995947 0.972376 
     2009 0.964966 1.033012 0.997441 0.965550 0.934131 
     2010 0.937789 1.042496 0.997441 0.956789 0.899563 
     2011 0.931920   NaN 0.980381  NaN  NaN 
     2012 0.888943   NaN 0.957350  NaN  NaN 
1003 2003 0.820642 0.878802 0.800412 0.910782 0.933820 
     2004 0.853842 0.906394 0.849923 0.937690 0.942022 
     2005 0.896329 0.940715 0.914422 0.972059 0.952818 
     2006 0.947139 0.975157 0.949459 0.973642 0.971270 
     2007 1.000000 1.000000 1.000000 1.000000 1.000000 
     2008 1.000182 1.019855 1.035974 1.015796 0.980711 
     2009 0.941655 1.040614 0.988102 0.949545 0.904902 
     2010 0.928746 1.062591 0.968898 0.911816 0.874038 
     2011 0.951772   NaN 0.966368  NaN  NaN 
     2012 0.965498   NaN 0.968992  NaN  NaN