2
我使用numpy的1.9,蟒蛇2.7與OpenCV的工作重複,處理大矩陣,我必須進行以下操作多次矩陣的優化NumPy的總和通過每一個元素
def sumShifted(A): # A: numpy array 1000*1000*10
return A[:, 0:-1] + A[:, 1:]
如果可能,我想優化此操作;我嘗試了Cython,但我沒有得到任何顯着的改進,但我不排除這是因爲我的實施不好。
有沒有辦法讓它更快?
編輯: sumShifted
獲取調用在一個循環是這樣的:
for i in xrange(0, 400):
# ... Various operations on B
A = sumShifted(B)
# ... Other operations on B
#More detailed
for i in xrange(0, 400):
A = sumShifted(a11)
B = sumShifted(a12)
C = sumShifted(b12)
D = sumShifted(b22)
v = -upQ12/upQ11
W, X, Z = self.function1(input_matrix, v, A, C[:,:,4], D[:,:,4])
S, D, F = self.function2(input_matrix, v, A, C[:,:,5], D[:,:,5])
AA = self.function3(input_matrix, v, A, C[:,:,6], D[:,:,6])
BB = self.function4(input_matrix, v, A, C[:,:,7], D[:,:,7])
EDIT2:按照你的意見,我創建了這個兩個可運行的基準測試(與用Cython)有關合並的4種sumShifted
方法在一個。
A, B, C, D= improvedSumShifted(E, F, G, H)
#E,F: 1000x1000 matrices
#G,H: 1000x1000x8 matrices
#first implementation
def improvedSumShifted(np.ndarray[dtype_t, ndim=2] a, np.ndarray[dtype_t, ndim=2] b, np.ndarray[dtype_t, ndim=3] c, np.ndarray[dtype_t, ndim=3] d):
cdef unsigned int i,j,k;
cdef unsigned int w = a.shape[0], h = a.shape[1]-1, z = c.shape[2]
cdef np.ndarray[dtype_t, ndim=2] aa = np.empty((w, h))
cdef np.ndarray[dtype_t, ndim=2] bb = np.empty((w, h))
cdef np.ndarray[dtype_t, ndim=3] cc = np.empty((w, h, z))
cdef np.ndarray[dtype_t, ndim=3] dd = np.empty((w, h, z))
with cython.boundscheck(False), cython.wraparound(False), cython.overflowcheck(False), cython.nonecheck(False):
for i in range(w):
for j in range(h):
aa[i,j] = a[i,j] + a[i,j+1]
bb[i,j] = b[i,j] + b[i,j+1]
for k in range(z):
cc[i,j,k] = c[i,j,k] + c[i,j+1,k]
dd[i,j,k] = d[i,j,k] + d[i,j+1,k]
return aa, bb, cc, dd
#second implementation
def improvedSumShifted(np.ndarray[dtype_t, ndim=2] a, np.ndarray[dtype_t, ndim=2] b, np.ndarray[dtype_t, ndim=3] c, np.ndarray[dtype_t, ndim=3] d):
cdef unsigned int i,j,k;
cdef unsigned int w = a.shape[0], h = a.shape[1]-1, z = c.shape[2]
cdef np.ndarray[dtype_t, ndim=2] aa = np.copy(a[:, 0:h])
cdef np.ndarray[dtype_t, ndim=2] bb = np.copy(b[:, 0:h])
cdef np.ndarray[dtype_t, ndim=3] cc = np.copy(c[:, 0:h])
cdef np.ndarray[dtype_t, ndim=3] dd = np.copy(d[:, 0:h])
with cython.boundscheck(False), cython.wraparound(False), cython.overflowcheck(False), cython.nonecheck(False):
for i in range(w):
for j in range(h):
aa[i,j] += a[i,j+1]
bb[i,j] += b[i,j+1]
for k in range(z):
cc[i,j,k] += c[i,j+1,k]
dd[i,j,k] += d[i,j+1,k]
return aa, bb, cc, dd
你能告訴我們一些代碼,說明如何'sumShifted'獲取調用? – unutbu 2014-10-06 10:22:28
@Rowandish [1000,1000,10]矩陣並不大,雖然,**你會好心也發表您的'.timeit()什麼是你最初的實現速度,從而爲基準什麼是好還是不'測量?** – user3666197 2014-10-06 10:33:39
@unutbu編輯問題 – Rowandish 2014-10-06 13:07:12