用numpy減去兩個交錯的，基於不同時間序列的數組？

我有兩個數據集，a[ts1]和b[ts2]，其中ts1和ts2是在不同時間（以不同的基地？）拍攝的時間戳。我想繪製b[ts2]-a[ts1]，但我認爲我犯了一個錯誤，因爲繪圖軟件明白我想要b[i]-a[i]，而i是值的索引順序。用numpy減去兩個交錯的，基於不同時間序列的數組？

所以我想用numpy做一個小例子，我意識到我不知道是否以及如何能夠執行此操作 - 但使用向量，並避免for循環。我做了一個例子（如下圖），它定義a[ts1]和b[ts2]爲numpy結構陣列題爲a_np和b_np：

array([(0.0, 0.0), (0.8865606188774109, 0.30000001192092896), 
     (1.6939274072647095, 0.6000000238418579), 
     (2.3499808311462402, 0.8999999761581421)], ... 
     dtype=[('a', '<f4'), ('ts1', '<f4')]) 
array([(0.3973386585712433, 0.10000000149011612), 
     (0.7788366675376892, 0.20000000298023224), 
     (1.4347121715545654, 0.4000000059604645), (1.6829419136047363, 0.5)], ... 
     dtype=[('b', '<f4'), ('ts2', '<f4')])

所以在這裏我的問題是：

這是什麼類的陣列/問題叫什麼名字？它只是「時間序列」數組嗎？基本上，它們描述的是一維信號，但由於時間戳必須保留，所以它是一個二維數組;並由於「時間」列可以有任何意義，我想它可以推廣到插值（值）列中的數組值，在（時間/索引）列中的「索引」值。
可以numpy做一個向量化的操作，其中的數組在正確插入「時間」之前，被扣除？

尋找相關資訊，我找到了pandas: Python Data Analysis Library;我想我應該用它來代替它，因爲它具有「時間序列」功能 - 但在這種情況下，我不需要任何樣本值的奇特插值 - 只是一個「步」或「保持」一個（基本上不是插值）;這就是爲什麼我在徘徊，如果numpy可以做矢量化的方式。否則，下面的示例使用for循環。

的示例將導致像這樣的圖像：

array-timeseries

陣列a和b表示在不同時間拍攝的值，這是由它們各自的指示impulses; a繪製與lines（因此，繪圖之前線性內插），但是b與steps（以指示存在的實際值）

陣列d1表示「原始」差b[t]-a[t]構建陣列時採取 - 顯然，我在現實中無法訪問這些數據，所以我必須從採樣值開始工作。在這種情況下，差異b[ts2]-a[ts1]被顯示爲陣列/信號d2，再次作爲steps以強調關於「原始」的錯誤。這d2是我想用numpy（但在下面，它是在相同的for循環中計算）計算。

我用繪圖軟件的錯誤是得到一個索引的指數差異b和a或b[i]-a[i];這顯示爲陣列/信號e - 並且如圖所示，它是方式偏離本來應該表示的。只有在兩個信號中的採樣間隔不均勻時纔是這種情況;在代碼中嘗試modulowith = 2，那麼e實際上並沒有那麼糟糕 - 但是，我的真實生活中有不均勻的時間戳，所以b[i]-a[i]根本沒有幫助我。

下面是代碼，這也要求gnuplot（Python的2.7測試，numpy 1.5我認爲）：

import subprocess 
import math, random 
import numpy as np 
from pprint import pprint 
from numpy.lib.recfunctions import append_fields 

step = 0.1 
modulowith = 3 
# must init all arrays separately; 
# a=b=[] makes a==b by reference! 
ts1 = []; ts2 = [] ; tsd = [] 
valsa = []; valsb = []; valsd1 = []; valsd2 = [] 
stra = strb = strd1 = strd2 = "" ; kval1 = kval2 = 0 
for ix in range(0, 100, 1): 
    ts = ix*step 
    val1 = 3.0*math.sin(ts) #+random.random() 
    val2 = 2.0*math.sin(2.0*ts) 
    if (ix%modulowith == 0): 
    ts1.append(ts) ; valsa.append(val1) 
    stra += "%.03f %.06f\n" % (ts, val1) 
    kval1 = val1 
    else: 
    ts2.append(ts) ; valsb.append(val2) 
    strb += "%.03f %.06f\n" % (ts, val2) 
    kval2 = val2 
    tsd.append(ts) 
    valb = val2 - val1 ; valsd1.append(valb) 
    strd1 += "%.03f %.06f\n" % (ts, valb) 
    valc = kval2 - kval1 ; valsd2.append(valc) 
    strd2 += "%.03f %.06f\n" % (ts, valc) 

a_np = np.array(
    [(_valsa,) for _valsa in valsa], 
    dtype=[('a','f4')] 
) 
b_np = np.array(
    [(_valsb,) for _valsb in valsb], 
    dtype=[('b','f4')] 
) 
a_np = append_fields(a_np, names='ts1', data=ts1, dtypes='f4', usemask=False) 
b_np = append_fields(b_np, names='ts2', data=ts2, dtypes='f4', usemask=False) 

pprint(a_np[:4]) 
pprint(b_np[:4]) 

# e_np = np.subtract(b_np['b'],a_np['a']) 
# (via field reference) is same as doing: 
# e_np = np.subtract(np.array(valsa, dtype="f4"), np.array(valsb, dtype="f4")) 
# but for different sized arrays, must do: 
e_np = b_np['b'] - np.resize(a_np, b_np.shape)['a'] 
pprint(e_np[:4]) 
e_str = "" 
for ts, ie in zip(ts2, e_np): 
    e_str += "%.03f %.06f\n" % (ts, ie) 


gpscript = """ 
plot "-" using 1:2 with lines lc rgb "green" t"a", \\ 
"" using 1:2 with impulses lc rgb "green" t"", \\ 
"" using 1:2 with steps lc rgb "blue" t"b", \\ 
"" using 1:2 with impulses lc rgb "blue" t"", \\ 
"" using 1:2 with lines lc rgb "red" t"d1", \\ 
"" using 1:2 with steps lc rgb "orange" t"d2", \\ 
"" using 1:2 with steps lc rgb "brown" t"e" 
- 
{0} 
e 
{0} 
e 
{1} 
e 
{1} 
e 
{2} 
e 
{3} 
e 
{4} 
e 
""".format(stra, strb, strd1, strd2, e_str) 

proc = subprocess.Popen(
    ['gnuplot','--persist'], 
    shell=False, 
    stdin=subprocess.PIPE, 
) 
proc.communicate(gpscript)

多虧了@runnerup答案，這裏是一個稍微詳細（語法示例的目的）numpy -only溶液：

# create union of both timestamp arrays as tsz 
ntz = np.union1d(b_np['ts2'], a_np['ts1']) 
# interpolate `a` values over tsz 
a_z = np.interp(ntz, a_np['ts1'], a_np['a']) 
# interpolate `b` values over tsz 
b_z = np.interp(ntz, b_np['ts2'], b_np['b']) 
# create structured arrays for resampled `a` and `b`, 
# indexed against tsz timestamps 
a_npz = np.array([ (tz,az) for tz,az in zip(ntz,a_z) ], 
    dtype=[('tsz', 'f4'), ('a', 'f4')]) 
b_npz = np.array([ (tz,bz) for tz,bz in zip(ntz,b_z) ], 
    dtype=[('tsz', 'f4'), ('b', 'f4')]) 
# subtract resized array 
e_npz = np.subtract(b_npz['b'], a_npz['a']) 
e_str = "" 

# check: 
pprint(e_npz[:4]) 
# gnuplot string: 
for ts, ie in zip(ntz, e_npz): 
    e_str += "%.03f %.06f\n" % (ts, ie)

這是線性內插，因此這將是從不同以上，但仍然很適合。

如果沒有爲那些for循環創建數組，那麼它是矢量化的 - 原則上我甚至不需要創建這些數組 - 只是想看看它們將如何看起來像結構化的。總而言之，我想我希望能夠做到這一點的單線程結構化數組（即處理字段名稱）。

來源

2013-12-08 sdaau

還涉及： http://stackoverflow.com/questions/12240634/what-is-the-best-drop-in-replacement-for-numpy-interp-if-i-want-the-null-interpo – sdaau

這是賣你切換到pandas嘗試：）

import numpy as np 
import pandas as pd 
import datetime as dt 
import matplotlib.pyplot as plt 

# one minute interval 
start = dt.datetime.now() 
end = start + dt.timedelta(minutes=1) 

# sin curve at seconds frequancy 
idx1 = pd.date_range(start, end, freq='S') 
ts1 = pd.Series(np.sin(np.linspace(0, 4 * np.pi, len(idx1))), index=idx1) 

# cosine curve at milisecond frequency 
idx2 = pd.date_range(start, end, freq='L') 
ts2 = pd.Series(np.cos(np.linspace(0, 4 * np.pi, len(idx2))), index=idx2)

現在len(ts1) = 61和len(ts2) = 6001，不同頻率

fig = plt.figure(figsize=(8, 6)) 
ax = fig.add_axes([.05, .05, .9, .9]) 

ts1.plot(ax, color='DarkBlue') 
ts2.plot(ax, color='DarkRed') 

# reindex ts2 like ts1 
ts2 = ts2.reindex_like(ts1) 
(ts1 - ts2).plot(ax, color='DarkGreen')

，你會得到：

time series

編輯：爲插值目的，你可以使用在statsmodels非參數方法，所以基本上你可以在另一個的頻率內插一個系列，然後減去兩個：

import statsmodels.api as sm 
n = 1000 
x = np.linspace(0, 1, n) 
y = np.random.randn(n).cumsum() 
z = sm.nonparametric.lowess(y, x, return_sorted=False, frac=.05) 

ax.plot(x, y, 'Blue', linestyle='--') 
ax.plot(x, z, color='DarkRed')

來源

2013-12-08 02:16:58

非常感謝@ runnerup - 很好的例子！對我來說，現在進入'pandas'會有一點認知負擔，但是它肯定在列表中：'）'如果只有'numpy'的方法，我仍然感興趣，但是...再次感謝 - 歡呼！ – sdaau

@sdaau好的，我編輯了我的答案替代解決方案 –

非常感謝那個@runnerup - 你的編輯給了我一個只有在numpy才能做到的想法，所以我編輯了OP（可能還有一個關於如何處理繪圖軟件）。偉大的提示知道，但在這種情況下，我想保留確切的時間戳和差異值，所以沒有平滑（'lowess'明顯）。最後，我想我希望能夠在OP編輯中使用一行代碼 - 但是三行並不是那麼糟糕：'''非常感謝 - 歡呼！ – sdaau

用numpy減去兩個交錯的，基於不同時間序列的數組？

回答

相關問題