2013-01-07 62 views
18

是否有更簡單的方法刪除列多重索引級別(在我的情況下,basic_amt),除了轉置兩次?重置列多重索引水平

In [704]: test 
Out[704]: 
      basic_amt    
Faculty   NSW QLD VIC All 
All    1 1 2 4 
Full Time   0 1 0 1 
Part Time   1 0 2 3 

In [705]: test.reset_index(level=0, drop=True) 
Out[705]: 
     basic_amt    
Faculty  NSW QLD VIC All 
0    1 1 2 4 
1    0 1 0 1 
2    1 0 2 3 

In [711]: test.transpose().reset_index(level=0, drop=True).transpose() 
Out[711]: 
Faculty NSW QLD VIC All 
All   1 1 2 4 
Full Time 0 1 0 1 
Part Time 1 0 2 3 

回答

10

另一種解決方案是使用利用MultiIndex.droplevel

import pandas as pd 

cols = pd.MultiIndex.from_arrays([['basic_amt']*4, 
            ['NSW','QLD','VIC','All']], 
            names = [None, 'Faculty']) 
idx = pd.Index(['All', 'Full Time', 'Part Time']) 

df = pd.DataFrame([(1,1,2,4), 
        (0,1,0,1), 
        (1,0,2,3)], index = idx, columns=cols) 

print (df) 
      basic_amt    
Faculty   NSW QLD VIC All 
All    1 1 2 4 
Full Time   0 1 0 1 
Part Time   1 0 2 3 

df.columns = df.columns.droplevel(0) 
#pandas 0.18.0 and higher 
df = df.rename_axis(None, axis=1) 
#pandas bellow 0.18.0 
#df.columns.name = None 

print (df) 
      NSW QLD VIC All 
All   1 1 2 4 
Full Time 0 1 0 1 
Part Time 1 0 2 3 

print (df.columns) 
Index(['NSW', 'QLD', 'VIC', 'All'], dtype='object') 

如果既需要列名稱使用list理解:

df.columns = ['_'.join(col) for col in df.columns] 
print (df) 
      basic_amt_NSW basic_amt_QLD basic_amt_VIC basic_amt_All 
All     1    1    2    4 
Full Time    0    1    0    1 
Part Time    1    0    2    3 

print (df.columns) 
Index(['basic_amt_NSW', 'basic_amt_QLD', 'basic_amt_VIC', 'basic_amt_All'], dtype='object') 
+1

另外要注意的是:如果你使用'_'作爲分隔符並且想重新創建一個多索引,你可以在my.df.columns中執行my_tuples = [i.split(「_」)'然後'pd.MultiIndex.from_tuples(my_tuples)' – RobinL

11

如何重新分配簡單df.columns

levels = df.columns.levels 
labels = df.columns.labels 
df.columns = levels[1][labels[1]] 

例如:

import pandas as pd 

columns = pd.MultiIndex.from_arrays([['basic_amt']*4, 
            ['NSW','QLD','VIC','All']]) 
index = pd.Index(['All', 'Full Time', 'Part Time'], name = 'Faculty') 
df = pd.DataFrame([(1,1,2,4), 
        (0,01,0,1), 
        (1,0,2,3)]) 
df.columns = columns 
df.index = index 

前:

print(df) 

      basic_amt    
       NSW QLD VIC All 
Faculty        
All    1 1 2 4 
Full Time   0 1 0 1 
Part Time   1 0 2 3 

後:

levels = df.columns.levels 
labels = df.columns.labels 
df.columns = levels[1][labels[1]] 
print(df) 

      NSW QLD VIC All 
Faculty      
All   1 1 2 4 
Full Time 0 1 0 1 
Part Time 1 0 2 3 
+1

,如果一個人在多指標水平= 0多個類別,這(根據你的例子),這將不會工作,同時打亂列的順序。你能想到一個更一般的(和失敗證明)解決方案嗎? – dmvianna

+0

我剛試過,它似乎工作找到我。你能舉一個你正在使用的DataFrame的例子嗎? – unutbu

+0

df = pd.DataFrame(np.array(np.mat('0 1 0 1; 1 0 2 3; 1 1 2 4'))) – dmvianna

1

郵編水平一起

這裏是一個拉鍊水平在一起,並與下劃線加入他們的替代解決方案。

從上述答案派生出來的,這是我發現這個答案時想做的事情。以爲我會分享,即使它沒有回答上述問題。

["_".join(pair) for pair in df.columns] 

給出

['basic_amt_NSW', 'basic_amt_QLD', 'basic_amt_VIC', 'basic_amt_All'] 

只是rename_axis(新中pandas0.18.0)設置此作爲列

df.columns = ["_".join(pair) for pair in df.columns] 

      basic_amt_NSW basic_amt_QLD basic_amt_VIC basic_amt_All 
Faculty                
All     1    1    2    4 
Full Time    0    1    0    1 
Part Time    1    0    2    3