熊貓透視表格式的列名

我用一個大熊貓數據幀的pandas.pivot_table功能和我的輸出看起來像這樣的東西呈三角：熊貓透視表格式的列名

    Winners     Runnerup    
     year  2016 2015 2014 2016 2015 2014 
Country Sport        
india badminton        
india wrestling

我真正需要的是像下面

Country Sport Winners_2016 Winners_2015 Winners_2014 Runnerup_2016 Runnerup_2015 Runnerup_2014 
india badminton 1 1 1 1 1 1 
india wrestling 1 0 1 0 1 0

一些事情

我有很多專欄和年份，所以我將無法手動編輯它們，所以任何人都可以請告訴我如何做到這一點？

來源

2016-08-20 Supreeth Meka

您也可以使用列表理解：

df.columns = ['_'.join(col) for col in df.columns] 
print (df) 
        Winners_2016 Winners_2015 Winners_2014 Runnerup_2016 \ 
Country Sport                 
india badminton    1    1    1    1 
     wrestling    1    1    1    1 

        Runnerup_2015 Runnerup_2014 
Country Sport          
india badminton    1    1 
     wrestling    1    1

與轉換columnsto_series然後另一種解決方案調用join：

df.columns = df.columns.to_series().str.join('_') 
print (df) 
        Winners_2016 Winners_2015 Winners_2014 Runnerup_2016 \ 
Country Sport                 
india badminton    1    1    1    1 
     wrestling    1    1    1    1 

        Runnerup_2015 Runnerup_2014 
Country Sport          
india badminton    1    1 
     wrestling    1    1

我正要定時真正感興趣的是：

In [45]: %timeit ['_'.join(col) for col in df.columns] 
The slowest run took 7.82 times longer than the fastest. This could mean that an intermediate result is being cached. 
100000 loops, best of 3: 4.05 µs per loop 

In [44]: %timeit ['{}_{}'.format(x,y) for x,y in zip(df.columns.get_level_values(0),df.columns.get_level_values(1))] 
The slowest run took 4.56 times longer than the fastest. This could mean that an intermediate result is being cached. 
10000 loops, best of 3: 131 µs per loop 

In [46]: %timeit df.columns.to_series().str.join('_') 
The slowest run took 4.31 times longer than the fastest. This could mean that an intermediate result is being cached. 
1000 loops, best of 3: 452 µs per loop

來源

2016-08-20 05:32:16 jezrael

真的很感興趣 - 第一個列表的理解速度快了30倍。 – jezrael

是的，這是非常有用的，因爲我正在處理更大的數據集。謝謝！ –

試試這個：

df.columns=['{}_{}'.format(x,y) for x,y in zip(df.columns.get_level_values(0),df.columns.get_level_values(1))]

get_level_values是你需要得到的只是得到的多指標水平的一個東西。

備註：您可能會嘗試使用這些數據。我真的很討厭大熊貓multiIndex，但它長在我身上。

來源

2016-08-20 04:35:29

熊貓透視表格式的列名

回答

相關問題