2017-05-04 79 views
1

我有兩個DataFrame(df_small和df_large)帶有DatetimeIndex和相似數量的行。 但是,時間戳(ns粒度)並不相同,並且可以說df_large比df_small覆蓋的時間要長得多,但它需要df_small的時間週期。在DataFrame(Python Pandas)中對日期進行索引/匹配是否有條件?

如何匹配時間段以便我可以在同一圖表中繪製它們(例如)?

一個類似於這個的方法應該以某種方式工作嗎?

df_small[df_small < df_large[-1:].index] 

這一次得到了我一個,雖然「提高ValueError異常(‘長度必須比較匹配’)」)......

談到df_large [-1:]指數爲numpy的陣列上的其他手將所有其他列的NaN,即

>>> df_small[df_large < numpy.array(df_small[-1:].index)] 
Out: DataFrame with all NaN's 

參考:

>>> df_small[-1:].index 
Out: DatetimeIndex(['2017-05-03 06:02:39.369627'], dtype='datetime64[ns]', name='time', freq=None) 

>>> df_large[-1:].index 
Out: DatetimeIndex(['2017-05-03 07:11:41.067240'], dtype='datetime64[ns]', name='time', freq=None) 

樣本數據:

>>> df_small 
          Position Price Side Size 
time             
2017-05-03 06:00:10.867023   0 3526  1  6 
2017-05-03 06:00:10.880251   1 3525  1 349 
2017-05-03 06:00:10.888418   2 3524  1 462 
2017-05-03 06:00:10.896323   3 3523  1 733 
2017-05-03 06:00:10.903938   4 3522  1 962 
2017-05-03 06:00:10.913828   0 3527  0 311 
2017-05-03 06:00:10.922124   1 3528  0 55 
2017-05-03 06:00:10.930258   2 3529  0 440 


>>> df_large 
          Last trade price Last trade size 
time                
2017-05-03 06:00:10.682447   3526.0    2 
2017-05-03 06:00:11.033645   3526.0    8 
2017-05-03 06:00:11.233167   3526.0    6 
2017-05-03 06:00:11.551196   3527.0    14 
2017-05-03 06:00:12.471409   3526.0    8 
2017-05-03 06:00:13.199685   3526.0    11 
2017-05-03 06:00:14.462006   3527.0    237 
2017-05-03 06:00:15.405271   3527.0    1 
+1

請包括演示問題樣本數據。 – piRSquared

+0

有道理...... – Bython

回答

1

我喜歡生成索引是兩者的聯合和使用interpolate來填補空白。請注意使用'index'選項,因爲它將基於索引值進行插值。

uidx = df_small.index.union(df_large.index) 
df = pd.concat([ 
     df_small.Price.reindex(uidx).interpolate('index'), 
     df_large['Last trade price'].reindex(uidx).interpolate('index'), 
    ], axis=1, keys=['Small', 'Large']) 

df 

          Small Large 
time          
2017-05-03 06:00:10.682447  NaN 3526.0 
2017-05-03 06:00:10.867023 3526.0 3526.0 
2017-05-03 06:00:10.880251 3525.0 3526.0 
2017-05-03 06:00:10.888418 3524.0 3526.0 
2017-05-03 06:00:10.896323 3523.0 3526.0 
2017-05-03 06:00:10.903938 3522.0 3526.0 
2017-05-03 06:00:10.913828 3527.0 3526.0 
2017-05-03 06:00:10.922124 3528.0 3526.0 
2017-05-03 06:00:10.930258 3529.0 3526.0 
2017-05-03 06:00:11.033645 3529.0 3526.0 
2017-05-03 06:00:11.233167 3529.0 3526.0 
2017-05-03 06:00:11.551196 3529.0 3527.0 
2017-05-03 06:00:12.471409 3529.0 3526.0 
2017-05-03 06:00:13.199685 3529.0 3526.0 
2017-05-03 06:00:14.462006 3529.0 3527.0 
2017-05-03 06:00:15.405271 3529.0 3527.0 

df.plot() 

enter image description here


設置

from io import StringIO 
import pandas as pd 

small_txt = """time      Position Price Side Size 
2017-05-03 06:00:10.867023   0 3526  1  6 
2017-05-03 06:00:10.880251   1 3525  1 349 
2017-05-03 06:00:10.888418   2 3524  1 462 
2017-05-03 06:00:10.896323   3 3523  1 733 
2017-05-03 06:00:10.903938   4 3522  1 962 
2017-05-03 06:00:10.913828   0 3527  0 311 
2017-05-03 06:00:10.922124   1 3528  0 55 
2017-05-03 06:00:10.930258   2 3529  0 440""" 

large_txt = """time      Last trade price Last trade size 
2017-05-03 06:00:10.682447   3526.0    2 
2017-05-03 06:00:11.033645   3526.0    8 
2017-05-03 06:00:11.233167   3526.0    6 
2017-05-03 06:00:11.551196   3527.0    14 
2017-05-03 06:00:12.471409   3526.0    8 
2017-05-03 06:00:13.199685   3526.0    11 
2017-05-03 06:00:14.462006   3527.0    237 
2017-05-03 06:00:15.405271   3527.0    1""" 

df_small = pd.read_csv(StringIO(small_txt), sep='\s{2,}', parse_dates=[0], index_col=0, engine='python') 
df_large = pd.read_csv(StringIO(large_txt), sep='\s{2,}', parse_dates=[0], index_col=0, engine='python') 
相關問題