2016-02-25 63 views
-1

描述: 我想插入missine值(表示爲NaN),但該方法只適用於已知值之間的NaN值。對於bfill中如何計算缺失值的值,我很困惑。據我所知,它只填補缺失值與第一個成功的已知值相同的值。這裏有一個例子:如果前幾行有NaN值,你如何內插NaN值?

>>> df = pd.DataFrame([['M', '2014-01-01 00:26:00', '2'], ['M', 'M', 'M'], ['M', '2014-01-01 00:26:30', 9],[5, '2014-01-01 00:26:50', 'M'],[6, '2014-01-01 00:26:50', 'M']], columns=['x','y','z']) 
>>> df 
    x     y z 
0 M 2014-01-01 00:26:00 2 
1 M     M M 
2 M 2014-01-01 00:26:30 9 
3 5 2014-01-01 00:26:50 M 
4 6 2014-01-01 00:26:50 M 
>>> df = df.replace(['M'],[np.NaN]) 
>>> df 
    x     y z 
0 NaN 2014-01-01 00:26:00 2 
1 NaN     NaN NaN 
2 NaN 2014-01-01 00:26:30 9 
3 5 2014-01-01 00:26:50 NaN 
4 6 2014-01-01 00:26:50 NaN 
>>> df['x'] = df['x'].astype(np.float64) 
>>> df['z'] = df['z'].astype(np.float64) 
>>> df['y'] = pd.to_datetime(df['y']) 
>>> df 
    x     y z 
0 NaN 2014-01-01 00:26:00 2 
1 NaN     NaT NaN 
2 NaN 2014-01-01 00:26:30 9 
3 5 2014-01-01 00:26:50 NaN 
4 6 2014-01-01 00:26:50 NaN 
>>> df.interpolate() 
    x     y z 
0 NaN 2014-01-01 00:26:00 2.0 
1 NaN     NaT 5.5 
2 NaN 2014-01-01 00:26:30 9.0 
3 5 2014-01-01 00:26:50 9.0 
4 6 2014-01-01 00:26:50 9.0 
>>> df.interpolate(method='bfill')# try to fill first three rows in x 
    x     y z 
0 2 2014-01-01 00:26:00 2 
1 NaN     NaT NaN 
2 9 2014-01-01 00:26:30 9 
3 5 2014-01-01 00:26:50 NaN 
4 6 2014-01-01 00:26:50 NaN 

目標: 我要填寫x和z,如果有可能,以填補Y,它有日期時間類型。

+0

也許想要:'print df.fillna(method ='bfill')' – jezrael

回答

1

IIUC你可以使用interpolate讓你值z列,然後fillnabfill

In [122]: df.interpolate().fillna(method='bfill') 
Out[122]: 
    x     y z 
0 5 2014-01-01 00:26:00 2.0 
1 5 2014-01-01 00:26:30 5.5 
2 5 2014-01-01 00:26:30 9.0 
3 5 2014-01-01 00:26:50 9.0 
4 6 2014-01-01 00:26:50 9.0 

或者:

In [128]: df.fillna(method='bfill').interpolate() 
Out[128]: 
    x     y z 
0 5 2014-01-01 00:26:00 2 
1 5 2014-01-01 00:26:30 9 
2 5 2014-01-01 00:26:30 9 
3 5 2014-01-01 00:26:50 9 
4 6 2014-01-01 00:26:50 9 

的方法順序取決於你想怎麼填最後一列