2013-09-25 75 views
1

我有三個熊貓數據幀包含測試過程中記錄的數據。一幀用於溫度,另一幀用於真空,另一幀用於電壓。結合不同採樣率的熊貓數據幀

數據是獨立捕獲的,因此每個幀的時間值不對齊。只有偶爾從一個幀的時間標記在另一個幀中有重複。

我想要做的是將這些組合成一個數據框,然後插入缺失值,以便我有一個完整的數據框。

我是熊貓新手,一直在四處尋找,但我不覺得我有任何地方,或者如果我甚至在正確的道路上。

+3

其基本思想是解決所有數據幀使用的常見日期時間索引問題。這聽起來像你想要把所有觀察時間結合起來,所以這應該很容易。像pd.concat([df1,df2,df3],axis = 1).fillna()。但是,除非您發佈一些示例數據和期望的內容,否則您將無法獲得任何人的完整答案。 – TomAugspurger

回答

5
import pandas as pd 
import numpy as np 

rng1 = pd.date_range(
    '1/1/2012', 
    periods=10, 
    freq='H' 
) 

s1 = pd.Series(
    np.arange(10), 
    index=rng1 
) 

df1 = pd.DataFrame(
    {'temp': s1} 
) 

s2 = pd.Series(
    np.arange(5, 10), 
    index=['1/1/2012 01:20:00', 
      '1/1/2012 01:40:00', 
      '1/1/2012 02:00:00', 
      '1/1/2012 05:30:00', 
      '1/1/2012 06:00:00'] 
) 

df2 = pd.DataFrame(
    {'voltage': s2}, 
) 

print df1 
print df2 

--output:-- 
        temp 
2012-01-01 00:00:00  0 
2012-01-01 01:00:00  1 
2012-01-01 02:00:00  2 
2012-01-01 03:00:00  3 
2012-01-01 04:00:00  4 
2012-01-01 05:00:00  5 
2012-01-01 06:00:00  6 
2012-01-01 07:00:00  7 
2012-01-01 08:00:00  8 
2012-01-01 09:00:00  9 

        voltage 
1/1/2012 01:20:00  5 
1/1/2012 01:40:00  6 
1/1/2012 02:00:00  7 
1/1/2012 05:30:00  8 
1/1/2012 06:00:00  9 


combined = df1.join(df2, how='outer') 
print combined 

--output:-- 
        temp voltage 
2012-01-01 00:00:00  0  NaN 
2012-01-01 01:00:00  1  NaN 
2012-01-01 01:20:00 NaN  5 
2012-01-01 01:40:00 NaN  6 
2012-01-01 02:00:00  2  7 
2012-01-01 03:00:00  3  NaN 
2012-01-01 04:00:00  4  NaN 
2012-01-01 05:00:00  5  NaN 
2012-01-01 05:30:00 NaN  8 
2012-01-01 06:00:00  6  9 
2012-01-01 07:00:00  7  NaN 
2012-01-01 08:00:00  8  NaN 
2012-01-01 09:00:00  9  NaN 

combined = combined.apply(
    pd.Series.interpolate, 
    args=('time',) 
) 

print combined 

--output:-- 
         temp voltage 
2012-01-01 00:00:00 0.000000  NaN 
2012-01-01 01:00:00 1.000000  NaN 
2012-01-01 01:20:00 1.333333 5.000000 
2012-01-01 01:40:00 1.666667 6.000000 
2012-01-01 02:00:00 2.000000 7.000000 
2012-01-01 03:00:00 3.000000 7.285714 
2012-01-01 04:00:00 4.000000 7.571429 
2012-01-01 05:00:00 5.000000 7.857143 
2012-01-01 05:30:00 5.500000 8.000000 
2012-01-01 06:00:00 6.000000 9.000000 
2012-01-01 07:00:00 7.000000 9.000000 
2012-01-01 08:00:00 8.000000 9.000000 
2012-01-01 09:00:00 9.000000 9.000000 

print combined.fillna(method='backfill') 

--output:-- 
         temp voltage 
2012-01-01 00:00:00 0.000000 5.000000 
2012-01-01 01:00:00 1.000000 5.000000 
2012-01-01 01:20:00 1.333333 5.000000 
2012-01-01 01:40:00 1.666667 6.000000 
2012-01-01 02:00:00 2.000000 7.000000 
2012-01-01 03:00:00 3.000000 7.285714 
2012-01-01 04:00:00 4.000000 7.571429 
2012-01-01 05:00:00 5.000000 7.857143 
2012-01-01 05:30:00 5.500000 8.000000 
2012-01-01 06:00:00 6.000000 9.000000 
2012-01-01 07:00:00 7.000000 9.000000 
2012-01-01 08:00:00 8.000000 9.000000 
2012-01-01 09:00:00 9.000000 9.000000