合併2熊貓DataFrame，如果索引匹配取一個記錄在另一個

我想要合併兩個熊貓DataFrames，但任何地方索引匹配我只想合併在行中從一個特定的df。合併2熊貓DataFrame，如果索引匹配取一個記錄在另一個

所以，如果我有

df1 
      A B 
type model 
apple v1 10 xyz 
orange v2 11 pqs 

df2 
      A B 
type model 
apple v3 11 xyz 
grape v4 12 def

我會得到

df3 
      A B 
type model 
apple v1 10 xyz 
orange v2 11 pqs 
grape v4 12 def

因爲df1.ix['apple']優先df2.ix['apple']，並且orange和grape是唯一的。

我一直在嘗試做一些索引比較工作，但df2.drop(df1.index[[0]])只是刪除了df2的全部內容。

兩個數據幀多索引具有類似的結構，用創建人：

pd.read_csv(..., index_col=[3, 1])

導致這樣一個指標：

MultiIndex(
    levels=[[u'apple', u'orange', u'grape', ...], [u'v1', u'v2', u'v3', ... ]], 
    labels=[[0, 1, 2, 3, 4, 6, 7, 8, 9, 10, ...]], 
    names=[u'type', u'model'] 
)

來源

2016-06-29 getglad

這是DataFrame.combine_first()是什麼：

import pandas as pd 

df1 = pd.DataFrame({'A': [10, 11], 'B': ['xyz', 'pqs']}, index=['apple', 'orange']) 
df2 = pd.DataFrame({'A': [11, 12], 'B': ['xyz', 'def']}, index=['apple', 'grape']) 

df3 = df1.combine_first(df2)

產生

df3 
      A B 
apple 10.0 xyz 
grape 12.0 def 
orange 11.0 pqs

編輯：我張貼後，此問題已大幅度修改回答以上—將model級別添加到索引，有效地將其轉換爲MultiIndex。

import pandas as pd 

# Create the df1 in the question 
df1 = pd.DataFrame({'model': ['v1', 'v2'], 'A': [10, 11], 'B': ['xyz', 'pqs']}, 
        index=['apple', 'orange']) 
df1.index.name = 'type' 
df1.set_index('model', append=True, inplace=True) 

# Create the df2 in the question 
df2 = pd.DataFrame({'model': ['v3', 'v4'], 'A': [11, 12], 'B': ['xyz', 'def']}, 
        index=['apple', 'grape']) 
df2.index.name = 'type' 
df2.set_index('model', append=True, inplace=True) 

# Solution: remove the `model` from the index and apply the above 
#  technique. Restore it to the index at the end if you want. 
df1.reset_index(level=1, inplace=True) 
df2.reset_index(level=1, inplace=True) 
df3 = df1.combine_first(df2).set_index('model', append=True)

結果：

df3 
       A B 
type model   
apple v1  10.0 xyz 
grape v4  12.0 def 
orange v2  11.0 pqs

來源

2016-06-29 18:35:09

我更新了我的問題，但我有一個多索引。當我使用combine_first時，它只是將索引合併在一起，所以我最終會得到兩個蘋果 - 是否可以將'.combine_first'與'.groupby（level = 0）'一起使用？ – getglad

@getglad：我建議你發佈這個作爲一個新的問題，更多的信息和一個最小化，完整和可驗證的例子（http://stackoverflow.com/help/mcve） –

，如果你願意，你可以試試這個在df1的細胞中保留NaN，或者如果您有多重索引，您將收到以下數據：NotImplementedError: merging with both multi-indexes is not implemented使用時combine_first()：

In [53]: df1 
Out[53]: 
       A B 
ind1 ind2 
foo apple 10 NaN 
bar orange 11 pqs 
baz grape 12 def 

In [54]: df2 
Out[54]: 
      A B 
ind1 ind2 
foo apple 11 xyz 
baz grape 12 def 

In [55]: pd.concat([df1, df2.ix[df2.index.difference(df1.index)]]) 
Out[55]: 
       A B 
ind1 ind2 
foo apple 10 NaN 
bar orange 11 pqs 
baz grape 12 def

OLD答案：

例如（在apple行注重在df1）：

In [33]: df1 
Out[33]: 
     A B 
apple 10 NaN 
orange 11 pqs 
grape 12 def 

In [34]: df2 
Out[34]: 
     A B 
apple 11 xyz 
grape 12 def 

In [35]: df1.combine_first(df2) 
Out[35]: 
     A B 
apple 10 xyz 
grape 12 def 
orange 11 pqs 

In [36]: pd.concat([df1, df2.ix[df2.index.difference(df1.index)]]) 
Out[36]: 
     A B 
apple 10 NaN 
orange 11 pqs 
grape 12 def

否則從@Alberto加西亞拉沃索解決方案（正常指數）絕對是更好，更快。它也可能在將來的版本大熊貓工作...

來源

2016-06-29 18:27:02 MaxU

只是好奇 - 什麼是錯的與你的第一個答案'.groupby（水平= 0）。首先（）'？ – getglad

@getglad，如果在生成的DF中，我們會從'df2'開始，然後從'df1'開始搜索相同的索引... – MaxU

合併2熊貓DataFrame，如果索引匹配取一個記錄在另一個

回答

相關問題