使用MultiIndex導出熊貓數據框

我剛剛發現了熊貓，並對其功能印象深刻。我很難理解如何使用MultiIndex處理DataFrame。使用MultiIndex導出熊貓數據框

我有兩個問題：

（1）出口數據幀

這裏我的問題：此數據集

import pandas as pd 
import StringIO 
d1 = StringIO.StringIO(
    """Gender,Employed,Region,Degree 
    m,yes,east,ba 
    m,yes,north,ba 
    f,yes,south,ba 
    f,no,east,ba 
    f,no,east,bsc 
    m,no,north,bsc 
    m,yes,south,ma 
    f,yes,west,phd 
    m,no,west,phd 
    m,yes,west,phd """ 
    ) 

df = pd.read_csv(d1) 

# Frequencies tables 
tab1 = pd.crosstab(df.Gender, df.Region) 
tab2 = pd.crosstab(df.Gender, [df.Region, df.Degree]) 
tab3 = pd.crosstab([df.Gender, df.Employed], [df.Region, df.Degree]) 

# Now we export the datasets 
tab1.to_excel('H:/test_tab1.xlsx') # OK 
tab2.to_excel('H:/test_tab2.xlsx') # fails 
tab3.to_excel('H:/test_tab3.xlsx') # fails

一個變通我能想到的是改變列（方式R）

def NewColums(DFwithMultiIndex): 
     NewCol = [] 
     for item in DFwithMultiIndex.columns: 
       NewCol.append('-'.join(item)) 
     return NewCol 

# New Columns 
tab2.columns = NewColums(tab2) 
tab3.columns = NewColums(tab3) 

# New export 
tab2.to_excel('H:/test_tab2.xlsx') # OK 
tab3.to_excel('H:/test_tab3.xlsx') # OK

我的問題是：有沒有更有效的方法來做到這一點熊貓，我錯過了文檔？

2）選擇列

這種新的結構不允許選擇在一個給定的變量（分層索引的排在首位的優勢）colums。如何選擇包含給定字符串的列（例如'-ba'）？

PS：我看到this question這是相關但不明白的答覆提出

來源

2013-01-15 user1043144

有趣的是'TAB2。 T.to_excel'工作，所以它只是列的MultIndex這是一個問題。 –

@hayden：感謝您更新鏈接。該功能確實方便顯示。 – user1043144

這看起來像在to_excel一個bug，暫時作爲一種解決方法，我會建議使用to_csv（這似乎不顯示這個問題）。

我將此添加爲an issue on github。

要回答第二個問題，如果你真的需要使用to_excel ...

您可以使用filter只選擇那些列，其中包括'-ba'：

In [21]: filter(lambda x: '-ba' in x, tab2.columns) 
Out[21]: ['east-ba', 'north-ba', 'south-ba'] 

In [22]: tab2[filter(lambda x: '-ba' in x, tab2.columns)] 
Out[22]: 
     east-ba north-ba south-ba 
Gender        
    f  1   0   1 
    m  1   1   0

來源

2013-01-15 19:29:30

謝謝。也知道我沒有在文檔中監督過某些東西。 – user1043144

使用MultiIndex導出熊貓數據框

回答

相關問題