2016-10-12 39 views
5

我一直在研究這個函數,該函數生成一些我需要的參數,用於我正在開發的模擬代碼,並且已經在增強其性能方面遇到困難。這個python函數可以被矢量化嗎?

分析代碼表明這是主要的瓶頸,所以我可以對它做出的任何增強都會很好。

我想嘗試矢量化這個函數的一部分,但我不確定它是否可能。

主要挑戰是存儲在我的數組params中的參數取決於params的索引。我看到的唯一直接的解決方案是使用np.ndenumerate,但這似乎很慢。

是否有可能矢量化這種類型的操作,其中存儲在數組中的值取決於它們的存儲位置?或者,創建一個能夠給我數組索引的元組的生成器會更智能/更快嗎?

import numpy as np 
from scipy.sparse import linalg as LA 

def get_params(num_bonds, energies): 
    """ 
    Returns the interaction parameters of different pairs of atoms. 

    Parameters 
    ---------- 
    num_bonds : ndarray, shape = (M, 20) 
     Sparse array containing the number of nearest neighbor bonds for 
     different pairs of atoms (denoted by their column) and next- 
     nearest neighbor bonds. Columns 0-9 contain nearest neighbors, 
     10-19 contain next-nearest neighbors 

    energies : ndarray, shape = (M,) 
     Energy vector corresponding to each atomic system stored in each 
     row of num_bonds. 
    """ 

    # -- Compute the bond energies 
    x = LA.lsqr(num_bonds, energies, show=False)[0] 

    params = np.zeros([4, 4, 4, 4, 4, 4, 4, 4, 4]) 

    nn = {(0,0): x[0], (1,1): x[1], (2,2): x[2], (3,3): x[3], (0,1): x[4], 
      (1,0): x[4], (0,2): x[5], (2,0): x[5], (0,3): x[6], (3,0): x[6], 
      (1,2): x[7], (2,1): x[7], (1,3): x[8], (3,1): x[8], (2,3): x[9], 
      (3,2): x[9]} 

    nnn = {(0,0): x[10], (1,1): x[11], (2,2): x[12], (3,3): x[13], (0,1): x[14], 
      (1,0): x[14], (0,2): x[15], (2,0): x[15], (0,3): x[16], (3,0): x[16], 
      (1,2): x[17], (2,1): x[17], (1,3): x[18], (3,1): x[18], (2,3): x[19], 
      (3,2): x[19]} 

    """ 
    params contains the energy contribution of each site due to its 
    local environment. The shape is given by the number of possible atom 
    types and the number of sites in the lattice. 
    """ 
    for (i,j,k,l,m,jj,kk,ll,mm), val in np.ndenumerate(params): 

     params[i,j,k,l,m,jj,kk,ll,mm] = nn[(i,j)] + nn[(i,k)] + nn[(i,l)] + \ 
             nn[(i,m)] + nnn[(i,jj)] + \ 
             nnn[(i,kk)] + nnn[(i,ll)] + nnn[(i,mm)] 

return np.ascontiguousarray(params) 

回答

3

下面是使用broadcasted求和一個量化的方法 -

# Gather the elements sorted by the keys in (row,col) order of a dense 
# 2D array for both nn and nnn 
sidx0 = np.ravel_multi_index(np.array(nn.keys()).T,(4,4)).argsort() 
a0 = np.array(nn.values())[sidx0].reshape(4,4) 

sidx1 = np.ravel_multi_index(np.array(nnn.keys()).T,(4,4)).argsort() 
a1 = np.array(nnn.values())[sidx1].reshape(4,4) 

# Perform the summations keep the first axis aligned for nn and nnn parts 
parte0 = a0[:,:,None,None,None] + a0[:,None,:,None,None] + \ 
    a0[:,None,None,:,None] + a0[:,None,None,None,:] 

parte1 = a1[:,:,None,None,None] + a1[:,None,:,None,None] + \ 
    a1[:,None,None,:,None] + a1[:,None,None,None,:]  

# Finally add up sums from nn and nnn for final output  
out = parte0[...,None,None,None,None] + parte1[:,None,None,None,None] 

運行測試

功能defintions -

def vectorized_approach(nn,nnn): 
    sidx0 = np.ravel_multi_index(np.array(nn.keys()).T,(4,4)).argsort() 
    a0 = np.array(nn.values())[sidx0].reshape(4,4)  
    sidx1 = np.ravel_multi_index(np.array(nnn.keys()).T,(4,4)).argsort() 
    a1 = np.array(nnn.values())[sidx1].reshape(4,4) 
    parte0 = a0[:,:,None,None,None] + a0[:,None,:,None,None] + \ 
     a0[:,None,None,:,None] + a0[:,None,None,None,:]  
    parte1 = a1[:,:,None,None,None] + a1[:,None,:,None,None] + \ 
     a1[:,None,None,:,None] + a1[:,None,None,None,:] 
    return parte0[...,None,None,None,None] + parte1[:,None,None,None,None] 

def original_approach(nn,nnn): 
    params = np.zeros([4, 4, 4, 4, 4, 4, 4, 4, 4]) 
    for (i,j,k,l,m,jj,kk,ll,mm), val in np.ndenumerate(params):  
     params[i,j,k,l,m,jj,kk,ll,mm] = nn[(i,j)] + nn[(i,k)] + nn[(i,l)] + \ 
             nn[(i,m)] + nnn[(i,jj)] + \ 
             nnn[(i,kk)] + nnn[(i,ll)] + nnn[(i,mm)] 
    return params 

設置輸入 -

# Setup inputs 
x = np.random.rand(30) 
nn = {(0,0): x[0], (1,1): x[1], (2,2): x[2], (3,3): x[3], (0,1): x[4], 
     (1,0): x[4], (0,2): x[5], (2,0): x[5], (0,3): x[6], (3,0): x[6], 
     (1,2): x[7], (2,1): x[7], (1,3): x[8], (3,1): x[8], (2,3): x[9], 
     (3,2): x[9]} 

nnn = {(0,0): x[10], (1,1): x[11], (2,2): x[12], (3,3): x[13], (0,1): x[14], 
     (1,0): x[14], (0,2): x[15], (2,0): x[15], (0,3): x[16], (3,0): x[16], 
     (1,2): x[17], (2,1): x[17], (1,3): x[18], (3,1): x[18], (2,3): x[19], 
     (3,2): x[19]} 

計時 -

In [98]: np.allclose(original_approach(nn,nnn),vectorized_approach(nn,nnn)) 
Out[98]: True 

In [99]: %timeit original_approach(nn,nnn) 
1 loops, best of 3: 884 ms per loop 

In [100]: %timeit vectorized_approach(nn,nnn) 
1000 loops, best of 3: 708 µs per loop 

歡迎1000x+加速!


對於這樣產品的通用號碼的系統,這裏有一個通用的解決方案,通過這些尺寸迭代 -

m,n = a0.shape # size of output array along each axis 
N = 4 # Order of system 
out = a0.copy() 
for i in range(1,N): 
    out = out[...,None] + a0.reshape((m,)+(1,)*i+(n,)) 

for i in range(N): 
    out = out[...,None] + a1.reshape((m,)+(1,)*(i+n)+(n,)) 
+0

這真是太棒了!我從來沒有遇到過這種設計模式,非常感謝你的幫助。 – Steve

+0

@Steve它很有趣!如果你真的感興趣的話,請查看'NumPy broadcast'獲取更多信息,並在SO上尋找更多的使用方法! :) – Divakar

+0

我以前使用過'NumPy broadcast',但只適用於更簡單的情況。這真的很棒,但我仍然在學習如何矢量化更復雜的功能。 – Steve