2017-05-13 158 views
1

運行得比較慢的行方向的點積給出一個矩陣具有形狀(N,K)和向量​​大小Ñ小號,我要計算矩陣ģ具有形狀(K,K)如下:numpy的「向量化」 for循環

G + = S [I] * A [I] .T * A [I],對於所有{ 0,...,n-1}

我試圖實現,使用一個for循環(方法1),並在矢量方式(方法2),但對於循環實現更快ķ(特別是當k的較大值> 500)。

該代碼被寫爲如下:

import numpy as np 
k = 200 
n = 50000 
A = np.random.randint(0, 1000, (n,k)) # generates random data for the matrix A (n,k) 
G1 = np.zeros((k,k)) # initialize G1 as a (k,k) matrix 
s = np.random.randint(0, 1000, n) * 1.0 # initialize a random vector of size n 

# METHOD 1 
for i in xrange(n): 
    G1 += s[i] * np.dot(np.array([A[i]]).T, np.array([A[i]])) 

# METHOD 2 
G2 = np.dot(A[:,np.newaxis].T, s[:,np.newaxis]*A) 
G2 = np.squeeze(G2) # reduces dimension from (k,1,k) to (k,k) 

的矩陣G1和G2是相同的(它們是基質ģ),並且唯一的差別是它們是如何計算的。有沒有更聰明有效的方法來計算?

最後,這些都是我隨機尺寸得到了ķñ時代:

Test #: 1 
k,n: (866, 45761) 
Method1: 337.457569838s 
Method2: 386.290487051s 
-------------------- 
Test #: 2 
k,n: (690, 48011) 
Method1: 152.999140978s 
Method2: 226.080267191s 
-------------------- 
Test #: 3 
k,n: (390, 5317) 
Method1: 5.28722500801s 
Method2: 4.86999702454s 
-------------------- 
Test #: 4 
k,n: (222, 5009) 
Method1: 1.73456382751s 
Method2: 0.929286956787s 
-------------------- 
Test #: 5 
k,n: (915, 16561) 
Method1: 101.782826185s 
Method2: 159.167108059s 
-------------------- 
Test #: 6 
k,n: (130, 11283) 
Method1: 1.53138184547s 
Method2: 0.64450097084s 
-------------------- 
Test #: 7 
k,n: (57, 37863) 
Method1: 1.44776391983s 
Method2: 0.494270086288s 
-------------------- 
Test #: 8 
k,n: (110, 34599) 
Method1: 3.51851701736s 
Method2: 1.61688089371s 

回答

5

兩個更加改進的版本是 -

(A.T*s).dot(A) 
(A.T).dot(A*s[:,None]) 

問題(S)與method2

method2,我們正在創建A[:,np.newaxis].T,其形狀爲(k,1,n),這是一個3D陣列。我認爲3D數組,np.dot進入某種循環,並沒有真正的矢量化(源代碼可以在這裏揭示更多的信息)。

對於這樣的3D張量乘法,最好使用張量當量:np.tensordot。因此,method2改進的版本就變成了:

G2 = np.tensordot(A[:,np.newaxis].T, s[:,np.newaxis]*A, axes=((2),(0))) 
G2 = np.squeeze(G2) 

因爲,我們是sum-reducing只是一個從每個這些輸入與np.tensordot軸,我們並不真正需要tensordot在這裏和在squeezed-in版本只是np.dot就足夠了。這將導致我們回到。

運行測試

途徑 -

def method1(A, s): 
    G1 = np.zeros((k,k)) # initialize G1 as a (k,k) matrix 
    for i in xrange(n): 
     G1 += s[i] * np.dot(np.array([A[i]]).T, np.array([A[i]])) 
    return G1 

def method2(A, s): 
    G2 = np.dot(A[:,np.newaxis].T, s[:,np.newaxis]*A) 
    G2 = np.squeeze(G2) # reduces dimension from (k,1,k) to (k,k) 
    return G2 

def method3(A, s): 
    return (A.T*s).dot(A) 

def method4(A, s): 
    return (A.T).dot(A*s[:,None]) 

def method2_improved(A, s): 
    G2 = np.tensordot(A[:,np.newaxis].T, s[:,np.newaxis]*A, axes=((2),(0))) 
    G2 = np.squeeze(G2) 
    return G2 

時序和驗證 -

In [56]: k = 200 
    ...: n = 5000 
    ...: A = np.random.randint(0, 1000, (n,k)) 
    ...: s = np.random.randint(0, 1000, n) * 1.0 
    ...: 

In [72]: print np.allclose(method1(A, s), method2(A, s)) 
    ...: print np.allclose(method1(A, s), method3(A, s)) 
    ...: print np.allclose(method1(A, s), method4(A, s)) 
    ...: print np.allclose(method1(A, s), method2_improved(A, s)) 
    ...: 
True 
True 
True 
True 

In [73]: %timeit method1(A, s) 
    ...: %timeit method2(A, s) 
    ...: %timeit method3(A, s) 
    ...: %timeit method4(A, s) 
    ...: %timeit method2_improved(A, s) 
    ...: 
1 loops, best of 3: 1.12 s per loop 
1 loops, best of 3: 693 ms per loop 
100 loops, best of 3: 8.12 ms per loop 
100 loops, best of 3: 8.17 ms per loop 
100 loops, best of 3: 8.28 ms per loop