2013-09-26 14 views
10

我有一些代碼,彙總了包含著名泰坦尼克數據集如下一個數據幀:重新編制索引一個多指標的電平,以任意的次序在熊貓

titanic['agecat'] = pd.cut(titanic.age, [0, 13, 20, 64, 100], 
       labels=['child', 'adolescent', 'adult', 'senior']) 
titanic.groupby(['agecat', 'pclass','sex'] 
       )['survived'].mean() 

這產生具有多指標以下數據幀基於所述groupby呼叫:

agecat  pclass sex 
adolescent 1  female 1.000000 
        male  0.200000 
      2  female 0.923077 
        male  0.117647 
      3  female 0.542857 
        male  0.125000 
adult  1  female 0.965517 
        male  0.343284 
      2  female 0.868421 
        male  0.078125 
      3  female 0.441860 
        male  0.159184 
child  1  female 0.000000 
        male  1.000000 
      2  female 1.000000 
        male  1.000000 
      3  female 0.483871 
        male  0.324324 
senior  1  female 1.000000 
        male  0.142857 
      2  male  0.000000 
      3  male  0.000000 
Name: survived, dtype: float64 

不過,我想多指標的agecat一級天然有序的,而不是按字母順序排列,即:['child', 'adolescent', 'adult', 'senior']。不過,如果我嘗試使用reindex做到這一點:

titanic.groupby(['agecat', 'pclass','sex'])['survived'].mean().reindex(
    ['child', 'adolescent', 'adult', 'senior'], level='agecat') 

它不具備對所得到的數據幀的多指標有任何影響。應該這樣工作,還是我使用了錯誤的方法?

回答

7

您需要提供一個多指標是重新排列

In [36]: index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], 
            ['one', 'two', 'three']], 
          labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], 
            [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], 
          names=['first', 'second']) 

In [37]: df = DataFrame(np.random.randn(10, 3), index=index, 
           columns=Index(['A', 'B', 'C'], name='exp')) 

In [38]: df 
Out[38]: 
exp     A   B   C 
first second        
foo one -1.007742 2.594146 1.211697 
     two  1.280218 0.799940 0.039380 
     three -0.501615 -0.136437 0.997753 
bar one -0.201222 0.060552 0.480552 
     two -0.758227 0.457597 -0.648014 
baz two -0.326620 1.046366 -2.047380 
     three 0.395894 1.128850 -1.126649 
qux one -0.353886 -1.200079 0.493888 
     two -0.124532 0.114733 1.991793 
     three -1.042094 1.079344 -0.153037 

通過在第二級

In [39]: idx = df.sortlevel(level='second').index 

In [40]: idx 
Out[40]: 
MultiIndex 
[(u'foo', u'one'), (u'bar', u'one'), (u'qux', u'one'), (u'foo', u'two'), (u'bar', u'two'), (u'baz', u'two'), (u'qux', u'two'), (u'foo', u'three'), (u'baz', u'three'), (u'qux', u'three')] 

In [41]: df.reindex(idx) 
Out[41]: 
exp     A   B   C 
first second        
foo one -1.007742 2.594146 1.211697 
bar one -0.201222 0.060552 0.480552 
qux one -0.353886 -1.200079 0.493888 
foo two  1.280218 0.799940 0.039380 
bar two -0.758227 0.457597 -0.648014 
baz two -0.326620 1.046366 -2.047380 
qux two -0.124532 0.114733 1.991793 
foo three -0.501615 -0.136437 0.997753 
baz three 0.395894 1.128850 -1.126649 
qux three -1.042094 1.079344 -0.153037 

不同的順序做一個排序模擬重新排序

In [42]: idx = idx[5:] + idx[:5] 

In [43]: idx 
Out[43]: 
MultiIndex 
[(u'bar', u'one'), (u'bar', u'two'), (u'baz', u'three'), (u'baz', u'two'), (u'foo', u'one'), (u'foo', u'three'), (u'foo', u'two'), (u'qux', u'one'), (u'qux', u'three'), (u'qux', u'two')] 

In [44]: df.reindex(idx) 
Out[44]: 
exp     A   B   C 
first second        
bar one -0.201222 0.060552 0.480552 
     two -0.758227 0.457597 -0.648014 
baz three 0.395894 1.128850 -1.126649 
     two -0.326620 1.046366 -2.047380 
foo one -1.007742 2.594146 1.211697 
     three -0.501615 -0.136437 0.997753 
     two  1.280218 0.799940 0.039380 
qux one -0.353886 -1.200079 0.493888 
     three -1.042094 1.079344 -0.153037 
     two -0.124532 0.114733 1.991793 
+0

我想你建議,*應*工作,請參閱此處的評論:https://github.com/pydata/pandas/blob/master/pandas/core/index.py#L1346,請op en – Jeff

+0

不幸的是,OP是正確的,'Dataframe.reindex()'在使用'level'關鍵字時被破壞,即使在這個日期的最新的熊貓開發分支中。請參閱https://github.com/pydata/pandas/issues/4088 –

相關問題