2014-07-02 38 views
3

假設我有這兩個數組:發現順序排列最佳匹配的數值序列

float arr[] = {40.4357,40.6135,40.2477,40.2864,39.3449,39.8901,40.103,39.9959,39.7863,39.9102,39.2652,39.2688,39.5147,38.2246,38.5376,38.4512,38.9951,39.0999,39.3057,38.53,38.2761,38.1722,37.8816,37.6521,37.8306,38.0853,37.9644,38.0626,38.0567,38.3518,38.4044,38.3553,38.4978,38.3768,38.2058,38.3175,38.3123,38.262,38.0093,38.3685,38.0111,38.4539,38.8122,39.1413,38.9409,39.2043,39.3538,39.4123,39.3628,39.2825,39.1898,39.0431,39.0634,38.5993,38.252,37.3793,36.6334,36.4009,35.2822,34.4262,34.2119,34.1552,34.3325,33.9626,33.2661,32.3819,35.1959,36.7602,37.9039,37.8103,37.5832,37.9718,38.3111,38.9323,38.6763,39.1163,38.8469,39.805,40.2627,40.3689,40.4064,40.0558,40.815,41.0234,41.0128,41.0296,41.0927,40.7046,40.6775,40.2711,40.1283,39.7518,40.0145,40.0394,39.8461,39.6317,39.5548,39.1996,38.9861,38.8507,38.8603,38.483,38.4711,38.4214,38.4286,38.5766,38.7532,38.7905,38.6029,38.4635,38.1403,36.6844,36.616,36.4053,34.7934,34.0226,33.0505,33.4978,34.6106,35.284,35.7535,35.3541,35.5481,35.4086,35.7096,36.0526,36.1222,35.9408,36.1007,36.7952,36.99,37.1024,37.0993,37.3144,36.6951,37.1213,38.0026,38.1266,39.2538,38.8963,39.0158,38.6235,38.7908,38.6041,38.4489,38.3207,37.7398,38.5304,38.925,38.7249,38.9221,39.1704,39.5113,40.0613,39.3602,39.8689,39.973,40.0524,40.0025,40.7584,40.9714,40.9106,40.9685,40.6554,39.7314,39.0044,38.7183,38.5163,38.6101,38.2004,38.7606,38.7532,37.8903,37.8403,38.5368,39.0462,38.8279,39.0748,39.2907,38.5447,38.423,38.5624,38.476,38.5784,39.0905,39.379,39.4739,39.5774,40.7036,40.3044,39.6162,39.9967,40.0562,39.3426,38.666,38.7561,39.2823,38.8548,37.6214,37.8188,38.1086,38.3619,38.5472,38.1357,38.1422,37.95,37.1837,37.4636,36.8852,37.1617,37.5051,37.7724,38.0879,37.7197,38.0422,37.8551,38.5688,38.8388}; 
float pattern[] = {38.6434,38.1409,37.3391,37.5457,37.7487,37.7499,37.6121,37.4789,37.5821,37.6541,38.0365,37.7907,37.9932,37.9945,37.7032,37.3556,37.6359,37.5412,37.5296,37.8829,38.3797,38.4452,39.0929,39.1233,39.3014,39.0317,38.903,38.8221,39.045,38.6944,39.0699,39.0978,38.9877,38.8123,38.7491,38.5888,38.7875,38.2086,37.7484,37.3961,36.8663,36.2607,35.8838,35.3297,35.5574,35.7239}; 

艾夫斯上傳這個例子圖:

正如你可以在圖形格式看到幾乎適合在陣列中在指數17

什麼是最好和最快的方式來找到這個指數?那麼有沒有辦法讓那裏的匹配保險絲有信心的價值是不相等的,你可以看到?

+2

您對圖案之間「距離」的測量是什麼?有效值,泰姬陵,歐式距離,還是別的什麼? – Marcin

+1

[This](http://nicoschertler.wordpress.com/2014/06/13/matching-error-prone-sequences-of-numbers-eg-time-codes-to-each-other/)可能會給你一個的想法,但它不完全是你所需要的。你可能不需要移位近似,因此只需要一次通過。 –

+0

有什麼限制?模式中是否存在差距,或者您是否需要找到最佳的起始指數? –

回答

1

如果起始索引是您唯一的自由度,您可以嘗試每個索引並計算每個數據點的平方誤差總和。在Python這可能是這樣的:

data = [40.4357,40.6135,40.2477,...] 
pattern = [38.6434,38.1409,37.3391,37.5457,37.7487,...] 

best_ind, best_err = 0, 1e9999 
for i in range(len(data) - len(pattern)): 
    subdata = data[i : i + len(pattern)] 
    err = sum((d-p)**2 for (d, p) in zip(subdata, pattern)) 
    if err < best_err: 
     best_ind, best_err = i, err 

結果:

>>> print best_ind, best_err 
17 21.27929269 
+0

如果有更多的自由度,例如如果在將模式與數據匹配時可能存在「空白」,則可以使用[最小編輯距離](http://en.wikipedia.org/wiki/Levenshtein_distance)的變體,使用平方誤差而不是固定成本替代。 –

+0

@ user3794234那麼,這是否適合你,或者還有其他什麼? –

+0

這些算法僅在單元格中的數據爲1:1時纔會匹配,但如果數據或模式發生移位,但仍然可以看到它們匹配,我會上傳示例圖表:http://screencast.com/t/CZ3y1jvsB72 –

0

的簡單算法是選擇趨同,(你如何描述的相似,這可能是一般的錯誤,或者他們的平方值或適合你的目的的任何其它功能),並應用步驟

  1. I = 0是一個整數指數,μm的尺寸=長度(數據)的容器 - 長度(圖案)+ 1存儲次測量
  2. 如果我<大小然後通過改變你的圖案,否則轉到步驟5
  3. 計算相似性量度和存儲到中號
  4. I = I + 1,轉到圖2和重複
  5. 之所以選擇最小值的索引中
0

這是一個一行在Python中,使用的事實,元組字典順序排序:

In [1]: 

import numpy as np 
arr = np.array([ 40.4357,40.6135,40.2477,40.2864,39.3449,39.8901,40.103,39.9959,39.7863,39.9102,39.2652,39.2688,39.5147,38.2246,38.5376,38.4512,38.9951,39.0999,39.3057,38.53,38.2761,38.1722,37.8816,37.6521,37.8306,38.0853,37.9644,38.0626,38.0567,38.3518,38.4044,38.3553,38.4978,38.3768,38.2058,38.3175,38.3123,38.262,38.0093,38.3685,38.0111,38.4539,38.8122,39.1413,38.9409,39.2043,39.3538,39.4123,39.3628,39.2825,39.1898,39.0431,39.0634,38.5993,38.252,37.3793,36.6334,36.4009,35.2822,34.4262,34.2119,34.1552,34.3325,33.9626,33.2661,32.3819,35.1959,36.7602,37.9039,37.8103,37.5832,37.9718,38.3111,38.9323,38.6763,39.1163,38.8469,39.805,40.2627,40.3689,40.4064,40.0558,40.815,41.0234,41.0128,41.0296,41.0927,40.7046,40.6775,40.2711,40.1283,39.7518,40.0145,40.0394,39.8461,39.6317,39.5548,39.1996,38.9861,38.8507,38.8603,38.483,38.4711,38.4214,38.4286,38.5766,38.7532,38.7905,38.6029,38.4635,38.1403,36.6844,36.616,36.4053,34.7934,34.0226,33.0505,33.4978,34.6106,35.284,35.7535,35.3541,35.5481,35.4086,35.7096,36.0526,36.1222,35.9408,36.1007,36.7952,36.99,37.1024,37.0993,37.3144,36.6951,37.1213,38.0026,38.1266,39.2538,38.8963,39.0158,38.6235,38.7908,38.6041,38.4489,38.3207,37.7398,38.5304,38.925,38.7249,38.9221,39.1704,39.5113,40.0613,39.3602,39.8689,39.973,40.0524,40.0025,40.7584,40.9714,40.9106,40.9685,40.6554,39.7314,39.0044,38.7183,38.5163,38.6101,38.2004,38.7606,38.7532,37.8903,37.8403,38.5368,39.0462,38.8279,39.0748,39.2907,38.5447,38.423,38.5624,38.476,38.5784,39.0905,39.379,39.4739,39.5774,40.7036,40.3044,39.6162,39.9967,40.0562,39.3426,38.666,38.7561,39.2823,38.8548,37.6214,37.8188,38.1086,38.3619,38.5472,38.1357,38.1422,37.95,37.1837,37.4636,36.8852,37.1617,37.5051,37.7724,38.0879,37.7197,38.0422,37.8551,38.5688,38.8388]) 
pattern = np.array([ 38.6434,38.1409,37.3391,37.5457,37.7487,37.7499,37.6121,37.4789,37.5821,37.6541,38.0365,37.7907,37.9932,37.9945,37.7032,37.3556,37.6359,37.5412,37.5296,37.8829,38.3797,38.4452,39.0929,39.1233,39.3014,39.0317,38.903,38.8221,39.045,38.6944,39.0699,39.0978,38.9877,38.8123,38.7491,38.5888,38.7875,38.2086,37.7484,37.3961,36.8663,36.2607,35.8838,35.3297,35.5574,35.7239 ]) 

min((((arr[i:i+len(pattern)] - pattern) ** 2).mean(), i) for i in xrange(len(arr)-len(pattern))) 

Out[5]: 
(0.46259331934782588, 17) 

其中0.46是最小均方誤差,17是最小的arr位置。