2015-12-08 123 views
2

的間隙我有兩個熊貓DataFrames熊貓填充丟失的數據

>>> import pandas as pd 
>>> import numpy as np 

>>> df1 = pd.DataFrame({'a': [1, 2, 3, 4], 'b': [np.nan, np.nan, 3, 4]}, 
       index=[['A', 'A', 'B', 'B'], [1, 2, 1, 2]]) 

>>> df1 
    a b 
A 1 1 NaN 
    2 2 NaN 
B 1 3 3 
    2 4 4 

>>> df2 = pd.DataFrame({'b': [1, 2]}, index=[['A','A'], [1, 2]]) 
>>> df2 

    b 
A 1 1 
    2 2 

其中DF2包含DF1的丟失的數據。如何合併兩個數據幀以獲得

 a b 
A 1 1 1 
    2 2 2 
B 1 3 3 
    2 4 4 

?我試圖pd.concat([df1,df2], axis=1)導致

 a b b 
A 1 1 NaN 1 
    2 2 NaN 2 
B 1 3 3 NaN 
    2 4 4 NaN 

在我的情況下可以保證我有沒有重疊的值。

+1

你可以試試'df1.combine_first(DF2)'或'df1.fillna(DF2)' – jezrael

回答

3

您可以嘗試combine_firstfillna

print df1.combine_first(df2) 
    a b 
A 1 1 1 
    2 2 2 
B 1 3 3 
    2 4 4 

print df1.fillna(df2) 
    a b 
A 1 1 1 
    2 2 2 
B 1 3 3 
    2 4 4 

時間:

In [5]: %timeit df1.combine_first(df2) 
The slowest run took 6.01 times longer than the fastest. This could mean that an intermediate result is being cached 
100 loops, best of 3: 2.15 ms per loop 

In [6]: %timeit df1.fillna(df2) 
The slowest run took 5.23 times longer than the fastest. This could mean that an intermediate result is being cached 
100 loops, best of 3: 2.76 ms per loop 
2

您還可以使用update

In [36]: df1.update(df2) 

In [37]: df1 
Out[37]: 
    a b 
A 1 1 1 
    2 2 2 
B 1 3 3 
    2 4 4