找到numpy陣列之間最小差異的位置

我有兩個音樂文件：一個是無損的，有一點聲音缺口（此時它只是沉默，但它可以是任何：正弦或只是一些噪音）一個MP3：找到numpy陣列之間最小差異的位置

In [1]: plt.plot(y[:100000]) 
Out[1]:

Lossless file

In [2]: plt.plot(y2[:100000]) 
Out[2]:

mp3 file

這個名單是相似但不完全相同，所以我需要削減這一差距，在另一個列表中找到第一個出現的具有最低差異錯誤的列表。

下面是我的解決方案（5.7065秒）：

error = [] 
for i in range(25000): 
    y_n = y[i:100000] 
    y2_n = y2[:100000-i] 
    error.append(abs(y_n - y2_n).mean()) 
start = np.array(error).argmin() 

print(start, error[start]) #23057 0.0100046

有沒有解決這個任何Python的方式？

編輯： 計算特殊點之間的平均距離後（例如在數據== 0.5）我減少25000搜索領域到2000年這給我的0.3871s合理時間：

a = np.where(y[:100000].round(1) == 0.5)[0] 
b = np.where(y2[:100000].round(1) == 0.5)[0] 

mean = int((a - b[:len(a)]).mean()) 
delta = 1000 

error = [] 
for i in range(mean - delta, mean + delta): 
...

來源

2015-07-19 Vlad Mironov

如果我們不比較整個數組而只比較其中最獨特的部分，該怎麼辦？ –

你試圖做的是兩個信號的cross-correlation。

這可以很容易地使用signal.correlate從scipy庫來完成：

import scipy.signal 
import numpy as np 

# limit your signal length to speed things up 
lim = 25000 

# do the actual correlation 
corr = scipy.signal.correlate(y[:lim], y2[:lim], mode='full') 

# The offset is the maximum of your correlation array, 
# itself being offset by (lim - 1): 
offset = np.argmax(corr) - (lim - 1)

你可能想看看this回答類似的問題。

來源

2015-07-19 10:51:40 Finwood

它比我的解決方案慢了三倍：18.4793s對5.7065s :) –

讓我們產生第一一些數據

N = 1000 
y1 = np.random.randn(N) 
y2 = y1 + np.random.randn(N) * 0.05 
y2[0:int(N/10)] = 0

在這些數據中，y1和y2是幾乎相同（注意小的附加噪聲），但y2的前10％是空的（類似於你的例子）

我們現在可以計算兩個向量之間的絕對差值，並找到第一個元素的絕對值差低於靈敏度閾值：

abs_delta = np.abs(y1 - y2) 
THRESHOLD = 1e-2 
sel = abs_delta < THRESHOLD 
ix_start = np.where(sel)[0][0] 


fig, axes = plt.subplots(3, 1) 
ax = axes[0] 
ax.plot(y1, '-') 
ax.set_title('y1') 
ax.axvline(ix_start, color='red') 
ax = axes[1] 
ax.plot(y2, '-') 
ax.axvline(ix_start, color='red') 
ax.set_title('y2') 

ax = axes[2] 
ax.plot(abs_delta) 
ax.axvline(ix_start, color='red') 
ax.set_title('abs diff')

sample data plotted

此方法有效如果重疊部分是確實是「幾乎相同」。如果相似度較低，則必須考慮更智能的對齊方式。

來源

2015-07-19 10:45:40

謝謝你的回答，但這不完全是我的問題，你應該改變'y2 [0：int（N/10）] = 0'到'y2 = np.hstack（[np.zeros（int（N/10 ）），y2]）[：N]' –

如果你知道'y2'基本上是'y1'，除了它的初始分數是「quiet」這個事實，那麼爲什麼你不能簡單地在哪個'y2'絕對值大於某個閾值？ –

因爲對於我來說，減少音軌之間的差異非常重要，不僅僅是減少垃圾（例如我不介意兩個文件中有5秒的沉默）。 –

我認爲你正在尋找的是相關性。這是一個小例子。

import numpy as np 

equal_part = [0, 1, 2, 3, -2, -4, 5, 0] 
y1 = equal_part + [0, 1, 2, 3, -2, -4, 5, 0] 
y2 = [1, 2, 4, -3, -2, -1, 3, 2]+y1 

np.argmax(np.correlate(y1, y2, 'same'))

輸出：

所以這將返回的時間差，其中這兩個信號之間的相關性最大。正如你所看到的，在這個例子中，時差應該是8，但這取決於你的數據... 另請注意，兩個信號具有相同的長度。

來源

2015-07-19 10:54:01 koffein

是的，我忘記了相關性，但不幸的是你的例子並不工作：'在[109]：np.correlate（[0,1,2， 3，4]，[1,2,3,4,5]，「相同」）輸出[109]：array（[14,26,40,30,20]）'但它應該是第二個元素不是第三。 –

找到numpy陣列之間最小差異的位置

回答

相關問題