2015-03-25 67 views
1

我需要創建並保存帶有分層索引的Pandas數據框。在下面我創建兩個數據框,然後連接它們以創建一個具有分層索引的新數據框。如何使用層次結構索引來保存和檢索Pandas數據框?

data1 = np.random.rand(5,5) 
data2 = np.random.rand(5,5) 
df1 = pd.DataFrame(data1, columns = ['a', 'b', 'c', 'd', 'e'], index=['i1', 'i2', 'i3', 'i4', 'i5']) 
df2 = pd.DataFrame(data2, columns = ['a', 'b', 'c', 'd', 'e'], index=['i1', 'i2', 'i3', 'i4', 'i5']) 

df = pd.concat([df1, df2], keys=['first', 'second']) 

print "Original Data frame" 
print df 

# Save to file. 
df.to_csv('test') 

# Read from file. 
df_new = pd.DataFrame.from_csv('test') 

print "Saved Data frame" 
print df_new 

下面是輸出,我得到的,

Original Data frame 
        a   b   c   d   e 
first i1 0.926553 0.180306 0.182887 0.783061 0.832914 
     i2 0.899054 0.130367 0.615534 0.965580 0.669495 
     i3 0.931004 0.425528 0.068938 0.166522 0.714399 
     i4 0.082365 0.587194 0.993864 0.187864 0.066035 
     i5 0.668671 0.294744 0.136317 0.358732 0.529674 
second i1 0.916310 0.361423 0.700380 0.386119 0.273667 
     i2 0.102542 0.454106 0.565760 0.259323 0.104743 
     i3 0.410280 0.379986 0.288921 0.177819 0.919343 
     i4 0.447279 0.113711 0.032273 0.335358 0.717824 
     i5 0.995781 0.356817 0.146785 0.972401 0.169360 

Saved Data frame 
     Unnamed: 1   a   b   c   d   e 
first   i1 0.926553 0.180306 0.182887 0.783061 0.832914 
first   i2 0.899054 0.130367 0.615534 0.965580 0.669495 
first   i3 0.931004 0.425528 0.068938 0.166522 0.714399 
first   i4 0.082365 0.587194 0.993864 0.187864 0.066035 
first   i5 0.668671 0.294744 0.136317 0.358732 0.529674 
second   i1 0.916310 0.361423 0.700380 0.386119 0.273667 
second   i2 0.102542 0.454106 0.565760 0.259323 0.104743 
second   i3 0.410280 0.379986 0.288921 0.177819 0.919343 
second   i4 0.447279 0.113711 0.032273 0.335358 0.717824 
second   i5 0.995781 0.356817 0.146785 0.972401 0.169360 

當我這個新的數據幀保存到一個CSV文件(「測試」),並讀回,我失去了分層索引。有沒有辦法將數據保存到文件中,這樣當我讀回數據時,我會保留分層索引?

回答

3

以另一種方式保存它,而不是使用csv。例如泡菜:

df.to_pickle('dataframe.pickle') 

這保留了分級索引。你讀它又來了:

pd.read_pickle('dataframe.pickle') 

大熊貓有幾個IO方法,你可以在documentation讀到它們。

1

您可以:

重置索引和數據幀保存到CSV,閱讀它從CSV回來,然後 設置索引回到原來的(就地)。

df 
Out[11]: 
        a   b   c   d   e 
first i1 0.935478 0.455757 0.607418 0.850291 0.704326 
     i2 0.675752 0.339017 0.999949 0.508480 0.888817 
     i3 0.463371 0.803389 0.048469 0.599697 0.423603 
     i4 0.935294 0.933699 0.843289 0.182535 0.255847 
     i5 0.321236 0.120010 0.647876 0.000517 0.032592 
second i1 0.172044 0.691660 0.799164 0.194785 0.302880 
     i2 0.432988 0.511229 0.451268 0.203145 0.560563 
     i3 0.442584 0.771483 0.839945 0.716374 0.533183 
     i4 0.167898 0.962646 0.152245 0.400280 0.210355 
     i5 0.736365 0.511057 0.256672 0.619250 0.790739 

df.reset_index() 
Out[12]: 
    level_0 level_1   a   b   c   d   e 
0 first  i1 0.935478 0.455757 0.607418 0.850291 0.704326 
1 first  i2 0.675752 0.339017 0.999949 0.508480 0.888817 
2 first  i3 0.463371 0.803389 0.048469 0.599697 0.423603 
3 first  i4 0.935294 0.933699 0.843289 0.182535 0.255847 
4 first  i5 0.321236 0.120010 0.647876 0.000517 0.032592 
5 second  i1 0.172044 0.691660 0.799164 0.194785 0.302880 
6 second  i2 0.432988 0.511229 0.451268 0.203145 0.560563 
7 second  i3 0.442584 0.771483 0.839945 0.716374 0.533183 
8 second  i4 0.167898 0.962646 0.152245 0.400280 0.210355 
9 second  i5 0.736365 0.511057 0.256672 0.619250 0.790739 

df.reset_index().to_csv('test.csv', index=False) 
df3 = pd.read_csv('test.csv') 
df3.set_index(['level_0', 'level_1'], inplace=True) 

>>> df3 
Out[15]: 
         a   b   c   d   e 
level_0 level_1             
first i1  0.935478 0.455757 0.607418 0.850291 0.704326 
     i2  0.675752 0.339017 0.999949 0.508480 0.888817 
     i3  0.463371 0.803389 0.048469 0.599697 0.423603 
     i4  0.935294 0.933699 0.843289 0.182535 0.255847 
     i5  0.321236 0.120010 0.647876 0.000517 0.032592 
second i1  0.172044 0.691660 0.799164 0.194785 0.302880 
     i2  0.432988 0.511229 0.451268 0.203145 0.560563 
     i3  0.442584 0.771483 0.839945 0.716374 0.533183 
     i4  0.167898 0.962646 0.152245 0.400280 0.210355 
     i5  0.736365 0.511057 0.256672 0.619250 0.790739