2013-11-23 100 views
3

我需要此功能進行優化,因爲我試圖讓我的OpenGL模擬運行速度更快。我想使用Parakeet,但我無法完全理解我需要如何修改下面的代碼才能這樣做。你能看到我應該做什麼嗎?使用Parakeet優化Python函數

def distanceMatrix(self,x,y,z): 
    " ""Computes distances between all particles and places the result in a matrix such that the ij th matrix entry corresponds to the distance between particle i and j"" " 
    xtemp = tile(x,(self.N,1)) 
    dx = xtemp - xtemp.T 
    ytemp = tile(y,(self.N,1)) 
    dy = ytemp - ytemp.T 
    ztemp = tile(z,(self.N,1)) 
    dz = ztemp - ztemp.T 

    # Particles 'feel' each other across the periodic boundaries 
    if self.periodicX: 
     dx[dx>self.L/2]=dx[dx > self.L/2]-self.L 
     dx[dx<-self.L/2]=dx[dx < -self.L/2]+self.L 
    if self.periodicY: 
     dy[dy>self.L/2]=dy[dy>self.L/2]-self.L 
     dy[dy<-self.L/2]=dy[dy<-self.L/2]+self.L 
    if self.periodicZ: 
     dz[dz>self.L/2]=dz[dz>self.L/2]-self.L 
     dz[dz<-self.L/2]=dz[dz<-self.L/2]+self.L 

    # Total Distances 
    d = sqrt(dx**2+dy**2+dz**2) 

    # Mark zero entries with negative 1 to avoid divergences 
    d[d==0] = -1 

    return d, dx, dy, dz 

從我可以告訴,鸚鵡應該能夠使用上面的功能沒有改變 - 它僅使用NumPy的和數學。但是,我總是從鸚鵡JIT包裝調用函數時出現以下錯誤:

AssertionError: Unsupported function: <bound method Particles.distanceMatrix of <particles.Particles instance at 0x04CD8E90>> 
+1

你也可以做的一種可能的優化方法是用一個局部變量替換'self.L/2'的12個用法。這可能會做點什麼。 :-)(雖然這對鸚鵡問題沒有幫助......抱歉) –

+1

我猜布爾索引不支持,但我不確定。我知道分配像xtemp和dx L/2這樣的中間數組是浪費的。儘管你的向量化代碼可能是常規numpy中的快速代碼,但使用JIT編譯器,最好使用鈍的for-loops! – 2013-11-24 14:49:55

回答

4

鸚鵡還很年輕,其NumPy的支持是不完整的,並且你的代碼倒是幾個特點,還不能正常工作。

1)你正在包裝一個方法,而Parakeet到目前爲止只知道如何處理函數。常用的解決方法是創建一個@jit包裝的幫助函數,並將您的方法調用到所有必需的成員數據中。方法不起作用的原因是將有意義的類型分配給「自我」是不平凡的。這不是不可能的,但棘手的是,方法不會進入長尾鸚鵡,直到下垂果實被採摘爲止。說起低掛的水果......

2)布爾索引。尚未實施,但將在下一個版本中發佈。

3)np.tile:也行不通,也可能會在下一個版本。如果您想查看哪些內置函數和NumPy庫函數可以工作,請查看Parakeet的mappings模塊。

我重寫你的代碼是一個小友好給鸚鵡:

@jit 
def parakeet_dist(x, y, z, L, periodicX, periodicY, periodicZ): 
    # perform all-pairs computations more explicitly 
    # instead of tile + broadcasting 
    def periodic_diff(x1, x2, periodic): 
    diff = x1 - x2 
    if periodic: 
     if diff > (L/2): diff -= L 
     if diff < (-L/2): diff += L 
    return diff 
    dx = np.array([[periodic_diff(x1, x2, periodicX) for x1 in x] for x2 in x]) 
    dy = np.array([[periodic_diff(y1, y2, periodicY) for y1 in y] for y2 in y]) 
    dz = np.array([[periodic_diff(z1, z2, periodicZ) for z1 in z] for z2 in z]) 
    d= np.sqrt(dx**2 + dy**2 + dz**2) 

    # since we can't yet use boolean indexing for masking out zero distances 
    # have to fall back on explicit loops instead 
    for i in xrange(len(x)): 
    for j in xrange(len(x)): 
     if d[i,j] == 0: d[i,j] = -1 
    return d, dx, dy, dz 

在我的機器本只運行〜3倍的速度比NumPy的爲N = 2000(0.39s爲NumPy的對比0.14s的鸚鵡) 。如果我重寫遍歷數組更明確地使用循環則表現上升到6倍〜比NumPy的更快(鸚鵡運行在〜0.06S):

@jit 
def loopy_dist(x, y, z, L, periodicX, periodicY, periodicZ): 
    N = len(x) 
    dx = np.zeros((N,N)) 
    dy = np.zeros((N,N)) 
    dz = np.zeros((N,N)) 
    d = np.zeros((N,N)) 

    def periodic_diff(x1, x2, periodic): 
    diff = x1 - x2 
    if periodic: 
     if diff > (L/2): diff -= L 
     if diff < (-L/2): diff += L 
    return diff 

    for i in xrange(N): 
    for j in xrange(N): 
     dx[i,j] = periodic_diff(x[j], x[i], periodicX) 
     dy[i,j] = periodic_diff(y[j], y[i], periodicY) 
     dz[i,j] = periodic_diff(z[j], z[i], periodicZ) 
     d[i,j] = dx[i,j] ** 2 + dy[i,j] ** 2 + dz[i,j] ** 2 
     if d[i,j] == 0: d[i,j] = -1 
     else: d[i,j] = np.sqrt(d[i,j]) 
    return d, dx, dy, dz 

隨着一點點的創意重寫你也可以得到上面的代碼在Numba運行,但它只比NumPy(0.25秒)快1.5倍。編譯時間是Parakeet w/comprehensions:1秒,Parakeet w /循環:0.5秒,Numba w /循環:0.9秒。

希望接下來的幾個版本能夠更多地使用NumPy庫函數,但現在理解或循環往往是要走的路。

+1

謝謝!儘管你也在reddit上回應了這個問題,但我恢復了這個帳戶以提高你的答案。遲到總比不到好。 – bogdan