2017-10-17 32 views
0

如何連接兩個帶有日期時間索引的pandas數據框,以便時間戳儘可能接近。有沒有可以使用的填充方法?試圖連接兩個時間序列數據框並儘可能接近匹配時間戳

一個例子可以是:

#required packages 
import pandas as pd 
import numpy as np 

# defining stuff 
num_periods_1 = 11 
num_periods_2 = 4 

# create sample time series 
dates1 = pd.date_range('1/1/2000', periods=num_periods_1, freq='10min') 
dates2 = pd.date_range('1/1/2000 00:40:00', periods=num_periods_2, freq='10min') 

column_names_1 = ['B', 'C', 'A'] 
column_names_2 = ['B', 'C', 'D'] 

df1 = pd.DataFrame(np.random.randn(num_periods_1, len(column_names_1)), index=dates1, columns=column_names_1) 
df2 = pd.DataFrame(np.random.randn(num_periods_2, len(column_names_2)), index=dates2, columns=column_names_2) 

print("\nData Frame One:\n", df1) 
print("\nData Frame Two:\n", df2) 

df3 = pd.concat([df1.reset_index().add_suffix('_x'), df2.reset_index().add_suffix('_y')], axis=1).set_index(['index_x', 'index_y']).sort_index(axis=1) 
print("\nData Frame Three:\n", df3) 

這裏輸出將顯示如下:

           A_x  B_x  B_y \ 
index_x    index_y            
2000-01-01 00:00:00 2000-01-01 00:40:00 0.878508 -0.608439 -0.468326 
2000-01-01 00:10:00 2000-01-01 00:50:00 -1.056812 0.070073 0.802728 
2000-01-01 00:20:00 2000-01-01 01:00:00 -0.085436 0.577973 1.278077 
2000-01-01 00:30:00 2000-01-01 01:10:00 -0.061046 -0.410809 -1.913346 
2000-01-01 00:40:00 NaT     -0.522415 -1.128558  NaN 
2000-01-01 00:50:00 NaT     0.1.266240  NaN 
2000-01-01 01:00:00 NaT     -2.411029 -0.303869  NaN 
2000-01-01 01:10:00 NaT     0.050969 -0.807989  NaN 
2000-01-01 01:20:00 NaT     -0.466958 0.311464  NaN 
2000-01-01 01:30:00 NaT     -0.137329 -0.234095  NaN 
2000-01-01 01:40:00 NaT     -1.089133 -0.173481  NaN 

               C_x  C_y  D_y 
index_x    index_y            
2000-01-01 00:00:00 2000-01-01 00:40:00 2.298649 0.673585 -1.586648 
2000-01-01 00:10:00 2000-01-01 00:50:00 -1.791427 0.907333 0.950786 
2000-01-01 00:20:00 2000-01-01 01:00:00 -0.980498 -0.625798 0.284694 
2000-01-01 00:30:00 2000-01-01 01:10:00 1.337427 -0.859036 -0.237332 
2000-01-01 00:40:00 NaT     -1.493857  NaN  NaN 
2000-01-01 00:50:00 NaT     0.455737  NaN  NaN 
2000-01-01 01:00:00 NaT     0.393388  NaN  NaN 
2000-01-01 01:10:00 NaT     -1.612417  NaN  NaN 
2000-01-01 01:20:00 NaT     2.471329  NaN  NaN 
2000-01-01 01:30:00 NaT     -0.541828  NaN  NaN 
2000-01-01 01:40:00 NaT     -0.162694  NaN  NaN 

我想要做的是轉向第二個索引到的時間戳匹配的第一個指數。這可能是通過concat,join或merge進行的嗎?

+1

也許做'DF2 = df2.reindex_axis(df1.index,0,方法= '最近')'的CONCAT過嗎? –

+0

它取決於它不完全匹配你想要的方向pd.merge_asof(df1,df2,left_on ='index_x',right_on ='index_y',direction ='backward')' – Wen

回答

0

不知道這是否會做,但如果你CONCAT使用前重新索引,

df2 = df2.reindex(df1.index) 
df3 = pd.concat([df1.reset_index().add_suffix('_x'),\ 
df2.reset_index().add_suffix('_y')], axis=1)\ 
.set_index(['index_x', 'index_y']).sort_index(axis=1) 
相關問題