我有兩個數據幀(logs
和failures
),我想合併,以便我在logs
中添加一個包含'失敗'中找到的最接近日期值的列。熊貓合併數據幀到最近的時間
的代碼來生成logs
,failures
,和所需output
低於:
import pandas as pd
logs=pd.DataFrame({'date-time':pd.Series(['23/10/2015 10:20:54','22/10/2015 09:51:32','21/10/2015 06:51:32','28/10/2015 16:59:32','25/10/2015 04:41:32','24/10/2015 11:50:11']),'var1':pd.Series([0,1,3,1,2,4])})
logs['date-time']=pd.to_datetime(logs['date-time'])
failures=pd.DataFrame({'date':pd.Series(['23/10/2015 00:00:00','22/10/2015 00:00:00','21/10/2015 00:00:00']),'failure':pd.Series([1,1,1])})
failures['date']=pd.to_datetime(failures['date'])
output=pd.DataFrame({'date-time':pd.Series(['23/10/2015 10:20:54','22/10/2015 09:51:32','21/10/2015 06:51:32','28/10/2015 16:59:32','25/10/2015 04:41:32','24/10/2015 11:50:11']),'var1':pd.Series([0,1,3,1,2,4]),'closest_failure':pd.Series(['23/10/2015 00:00:00','22/10/2015 00:00:00','21/10/2015 00:00:00','23/10/2015 00:00:00','23/10/2015 00:00:00','23/10/2015 00:00:00'])})
output['date-time']=pd.to_datetime(output['date-time'])
任何想法?真正的數據集非常大,所以效率也是一個問題。