2016-10-03 125 views
1

我有XLS格式這種形式的數據:轉換數據幀的一部分到多指標在熊貓

+--------+---------+-------------+---------------+---------+ 
| ID | Branch | Customer ID | Customer Name | Balance | 
+--------+---------+-------------+---------------+---------+ 
| 111111 | Branch1 | 1   | Company A  | 10  | 
+--------+---------+-------------+---------------+---------+ 
| 222222 | Branch2 | 2   | Company B  | 20  | 
+--------+---------+-------------+---------------+---------+ 
| 111111 | Branch1 | 2   | Company B  | 30  | 
+--------+---------+-------------+---------------+---------+ 
| 222222 | Branch2 | 3   | Company C  | 10  | 
+--------+---------+-------------+---------------+---------+ 

而且我想用大熊貓來處理它。大熊貓會讀它作爲一個單一的片材,但我想在這裏使用多指標,像

+--------+---------+-------------+---------------+---------+ 
| ID | Branch | Customer ID | Customer Name | Balance | 
+--------+---------+-------------+---------------+---------+ 
|  |   | 1   | Company A  | 10  | 
+ 111111 + Branch1 +-------------+---------------+---------+ 
|  |   | 2   | Company B  | 30  | 
+--------+---------+-------------+---------------+---------+ 
|  |   | 2   | Company B  | 20  | 
+ 222222 + Branch2 +-------------+---------------+---------+ 
|  |   | 3   | Company C  | 10  | 
+--------+---------+-------------+---------------+---------+ 

這裏111111Branch1是1級索引和1Company A是級別2的索引。有沒有內置的方法來做到這一點?

回答

1

如果只需要set_indexsort_index,用途:

df.set_index(['ID','Branch', 'Customer ID','Customer Name'], inplace=True) 
df.sort_index(inplace=True) 
print (df) 
              Balance 
ID  Branch Customer ID Customer Name   
111111 Branch1 1   Company A   10 
       2   Company B   30 
222222 Branch2 2   Company B   20 
       3   Company C   10 

但是,如果需要在只有兩個級別MultiIndexab在我的解決方案),是必要的連擊第一與第二列和第三與第四列:

df['a'] = df.ID.astype(str) + '_' + df.Branch 
df['b'] = df['Customer ID'].astype(str) + '_' + df['Customer Name'] 
#delete original columns 
df.drop(['ID','Branch', 'Customer ID','Customer Name'], axis=1, inplace=True) 

df.set_index(['a','b'], inplace=True) 
df.sort_index(inplace=True) 
print (df) 
          Balance 
a    b     
111111_Branch1 1_Company A  10 
       2_Company B  30 
222222_Branch2 2_Company B  20 
       3_Company C  10 

如果以前的專欄需要骨料最後一列,用groupbyGroupBy.mean

df = df.groupby(['ID','Branch', 'Customer ID','Customer Name'])['Balance'].mean().to_frame() 
print (df) 
              Balance 
ID  Branch Customer ID Customer Name   
111111 Branch1 1   Company A   10 
       2   Company B   30 
222222 Branch2 2   Company B   20 
       3   Company C   10 

如果與MultiIndex列工作需要tuplesset_index

df.columns = pd.MultiIndex.from_arrays([['a'] * 2 + ['b']* 2 + ['c'], df.columns]) 
print (df) 
     a     b      c 
     ID Branch Customer ID Customer Name Balance 
0 111111 Branch1   1  Company A  10 
1 222222 Branch2   2  Company B  20 
2 111111 Branch1   2  Company B  30 
3 222222 Branch2   3  Company C  10 

df.set_index([('a','ID'), ('a','Branch'), 
       ('b','Customer ID'), ('b','Customer Name')], inplace=True) 
df.sort_index(inplace=True) 
print (df) 
                   c 
                 Balance 
(a, ID) (a, Branch) (b, Customer ID) (b, Customer Name)   
111111 Branch1  1    Company A    10 
        2    Company B    30 
222222 Branch2  2    Company B    20 
        3    Company C    10