2016-04-21 101 views
1

考慮以下兩個DataFrames:合併MultIndex DataFrames

arrays1 = [['foo', 'bar', 'bar', 'bar'], 
      ['A', 'D', 'E', 'F']] 
tuples1 = list(zip(*arrays1))   
columnValues1 = pd.MultiIndex.from_tuples(tuples1) 
df1 = pd.DataFrame(np.random.rand(4,4), columns = columnValues1) 
print(df1) 
     foo  bar      
      A   D   E   F 
0 0.833444 0.354676 0.468294 0.173005 
1 0.409730 0.275342 0.595433 0.322785 
2 0.515161 0.340063 0.117509 0.491957 
3 0.285594 0.970524 0.322902 0.628351 

arrays2 = [['foo', 'foo', 'bar', 'bar'], 
      ['B', 'C', 'G', 'H']] 
tuples2 = list(zip(*arrays2))   
columnValues2 = pd.MultiIndex.from_tuples(tuples2) 
df2 = pd.DataFrame(np.random.rand(4,4), columns = columnValues2) 
print(df2) 
     foo     bar   
      B   C   G   H 
0 0.208822 0.762884 0.424412 0.583324 
1 0.767560 0.884583 0.716843 0.329719 
2 0.147991 0.424748 0.560599 0.828155 
3 0.376050 0.436354 0.704379 0.406324 

說我想合併這些得到這個:

  foo        bar     
      A   B   C   D   E   F   G   H 
0 0.833444 0.208822 0.762884 0.354676 0.468294 0.173005 0.424412 0.583324 
1 0.409730 0.767560 0.884583 0.275342 0.595433 0.322785 0.716843 0.329719 
2 0.515161 0.147991 0.424748 0.340063 0.117509 0.491957 0.560599 0.828155 
3 0.285594 0.376050 0.436354 0.970524 0.322902 0.628351 0.704379 0.406324 

我試圖通過合併:

pd.merge(df1.reset_index(), df2.reset_index(), on=df1.columns.levels[0], 
how='inner').set_index(df1.columns.levels[0]) 

不幸的是我收到以下錯誤信息:

ValueError: The truth value of an array with more than one element is ambiguous. 
Use a.any() or a.all() 

如何合併2個MultiIndex DataFrame? `

回答

1

UPDATE:動態選擇列:

In [57]: join = df1.join(df2) 

In [58]: cols = join.columns.get_level_values(0).unique() 

In [59]: cols 
Out[59]: array(['foo', 'bar'], dtype=object) 

In [60]: join = join[cols] 

In [61]: join 
Out[61]: 
     foo       bar        \ 
      A   B   C   D   E   F   G 
0 0.176934 0.694937 0.947164 0.510407 0.085626 0.162183 0.382840 
1 0.973283 0.743907 0.886495 0.028961 0.740759 0.330742 0.961932 
2 0.898224 0.966278 0.131551 0.517563 0.026104 0.624047 0.848640 
3 0.713660 0.704461 0.419997 0.718130 0.252294 0.336838 0.016916 


      H 
0 0.929695 
1 0.444762 
2 0.338168 
3 0.635817 

joined = df1.join(df2)[['foo','bar']] 

說明:

可以先加入您的DF的:

In [47]: join = df1.join(df2) 

In [48]: join 
Out[48]: 
     foo  bar       foo     bar \ 
      A   D   E   F   B   C   G 
0 0.176934 0.510407 0.085626 0.162183 0.694937 0.947164 0.382840 
1 0.973283 0.028961 0.740759 0.330742 0.743907 0.886495 0.961932 
2 0.898224 0.517563 0.026104 0.624047 0.966278 0.131551 0.848640 
3 0.713660 0.718130 0.252294 0.336838 0.704461 0.419997 0.016916 


      H 
0 0.929695 
1 0.444762 
2 0.338168 
3 0.635817 

和然後在des中選擇列(級別:0) ired order:

In [49]: join = join[['foo','bar']] 

In [50]: join 
Out[50]: 
     foo       bar        \ 
      A   B   C   D   E   F   G 
0 0.176934 0.694937 0.947164 0.510407 0.085626 0.162183 0.382840 
1 0.973283 0.743907 0.886495 0.028961 0.740759 0.330742 0.961932 
2 0.898224 0.966278 0.131551 0.517563 0.026104 0.624047 0.848640 
3 0.713660 0.704461 0.419997 0.718130 0.252294 0.336838 0.016916 


      H 
0 0.929695 
1 0.444762 
2 0.338168 
3 0.635817 
+0

實際上,0級除了'foo'和'bar'之外還有更多的標籤。有什麼辦法可以傳遞df1的列順序嗎? – BdB

+0

@BdB,當然,請參閱我的回答中的「更新」 – MaxU

+0

太棒了,感謝您的更新! – BdB

1

這不是一個真正的「合併」,因爲你並不真正匹配數據框之間的值,你只是並排添加一些列。所以pd.concat做了你需要的東西:

combined = pd.concat([df1, df2], axis=1) 
combined.sort_index(axis=1, inplace=True) 

combined 
Out[13]: 
     bar            foo   \ 
      D   E   F   G   H   A   B 
0 0.915879 0.712345 0.460795 0.529782 0.161578 0.803505 0.133896 
1 0.234319 0.317113 0.477687 0.525108 0.495104 0.107596 0.374732 
2 0.149397 0.244950 0.866735 0.501562 0.758321 0.508689 0.635703 
3 0.330018 0.204695 0.598899 0.522993 0.306496 0.936768 0.638874 


      C 
0 0.614592 
1 0.824297 
2 0.482161 
3 0.792035 
+0

這幾乎可以工作,有沒有辦法保持'foo'和'bar'的順序是? – BdB