我一直在試圖優化一段涉及大型多維數組計算的python代碼。我得到了與倫巴相違背的結果。我在MBP上運行,2015年年中,2.5 GHz i7 quadcore,OS 10.10.5,python 2.7.11。考慮以下幾點:numba guvectorize target ='parallel'slow than target ='cpu'
import numpy as np
from numba import jit, vectorize, guvectorize
import numexpr as ne
import timeit
def add_two_2ds_naive(A,B,res):
for i in range(A.shape[0]):
for j in range(B.shape[1]):
res[i,j] = A[i,j]+B[i,j]
@jit
def add_two_2ds_jit(A,B,res):
for i in range(A.shape[0]):
for j in range(B.shape[1]):
res[i,j] = A[i,j]+B[i,j]
@guvectorize(['float64[:,:],float64[:,:],float64[:,:]'],
'(n,m),(n,m)->(n,m)',target='cpu')
def add_two_2ds_cpu(A,B,res):
for i in range(A.shape[0]):
for j in range(B.shape[1]):
res[i,j] = A[i,j]+B[i,j]
@guvectorize(['(float64[:,:],float64[:,:],float64[:,:])'],
'(n,m),(n,m)->(n,m)',target='parallel')
def add_two_2ds_parallel(A,B,res):
for i in range(A.shape[0]):
for j in range(B.shape[1]):
res[i,j] = A[i,j]+B[i,j]
def add_two_2ds_numexpr(A,B,res):
res = ne.evaluate('A+B')
if __name__=="__main__":
np.random.seed(69)
A = np.random.rand(10000,100)
B = np.random.rand(10000,100)
res = np.zeros((10000,100))
我現在可以在各種功能運行timeit:
%timeit add_two_2ds_jit(A,B,res)
1000 loops, best of 3: 1.16 ms per loop
%timeit add_two_2ds_cpu(A,B,res)
1000 loops, best of 3: 1.19 ms per loop
%timeit add_two_2ds_parallel(A,B,res)
100 loops, best of 3: 6.9 ms per loop
%timeit add_two_2ds_numexpr(A,B,res)
1000 loops, best of 3: 1.62 ms per loop
看來,使用大多數單核的「水貨」沒有再碰,因爲它的使用情況top
顯示python的「並行」命中〜40%cpu,「cpu」約爲100%,並且命中達到〜300%。
但是'guvectorize'的意義在於你定義的操作被應用在任何_extra_維度上(這將是並行完成的)。您編寫的代碼不會自行並行。因此,如果將'A','B'和'res'更改爲形狀'(10000,100,100)',則第三維的100個不同迭代將並行運行。 – DavidW
謝謝,我看到我誤解了用法。 –