2017-04-22 51 views
7

我想在numpy中實現以下問題,這裏是我的代碼。如何優化numpy中此函數的計算?

我試過以下numpy代碼,用於循環的這個問題。我想知道是否有更有效的方法來做這種計算?我真的很感激!

k, d = X.shape 
m = Y.shape[0] 

c1 = 2.0*sigma**2 
c2 = 0.5*np.log(np.pi*c1) 
c3 = np.log(1.0/k) 

L_B = np.zeros((m,)) 
for i in xrange(m): 
    if i % 100 == 0: 
     print i 
    L_B[i] = np.log(np.sum(np.exp(np.sum(-np.divide(
       np.power(X-Y[i,:],2), c1)-c2,1)+c3))) 

print np.mean(L_B) 

我已經通過創建一個三維張量,因此下面的計算可以通過廣播進行想到np.expand_dims(X, 2).repeat(Y.shape[0], 2)-Y,但是當m是大,會浪費大量的內存。

我也相信,np.einsum()只利用for循環,所以可能效率不高,糾正我,如果我錯了。

有沒有想法?

回答

5

#1

我使用多圈代碼的直接轉換到broadcasting優化的第一級的基於一個在引入新的軸線,因此不那麼存儲器高效的,如下面列出的優化階段 -

p1 = (-((X[:,None] - Y)**2)/c1)-c2 
p11 = p1.sum(2) 
p2 = np.exp(p11+c3) 
out = np.log(p2.sum(0)).mean() 

優化階段#2

在考慮一些優化飼養日瞻在我們打算分離出的常量的操作,我結束了以下 -

c10 = -c1 
c20 = X.shape[1]*c2 

subs = (X[:,None] - Y)**2 
p00 = subs.sum(2) 
p10 = p00/c10 
p11 = p10-c20 
p2 = np.exp(p11+c3) 
out = np.log(p2.sum(0)).mean() 

優化階段#3

它進一步去和和看到過的地方的操作可能優化,我結束了使用Scipy's cdist來取代平方和sum-reduction的重量級的工作。這應該是相當的內存使用效率,給我們最終的實現,如下圖所示 -

from scipy.spatial.distance import cdist 

# Setup constants 
c10 = -c1 
c20 = X.shape[1]*c2 
c30 = c20-c3 
c40 = np.exp(c30) 
c50 = np.log(c40) 

# Get stagewise operations corresponding to loopy ones 
p1 = cdist(X, Y, 'sqeuclidean') 
p2 = np.exp(p1/c10).sum(0) 
out = np.log(p2).mean() - c50 

運行測試

途徑 -

def loopy_app(X, Y, sigma): 
    k, d = X.shape 
    m = Y.shape[0] 

    c1 = 2.0*sigma**2 
    c2 = 0.5*np.log(np.pi*c1) 
    c3 = np.log(1.0/k) 

    L_B = np.zeros((m,)) 
    for i in xrange(m): 
     L_B[i] = np.log(np.sum(np.exp(np.sum(-np.divide(
        np.power(X-Y[i,:],2), c1)-c2,1)+c3))) 

    return np.mean(L_B) 

def vectorized_app(X, Y, sigma): 
    # Setup constants 
    k, d = D_A.shape 
    c1 = 2.0*sigma**2 
    c2 = 0.5*np.log(np.pi*c1) 
    c3 = np.log(1.0/k) 

    c10 = -c1 
    c20 = X.shape[1]*c2 
    c30 = c20-c3 
    c40 = np.exp(c30) 
    c50 = np.log(c40) 

    # Get stagewise operations corresponding to loopy ones 
    p1 = cdist(X, Y, 'sqeuclidean') 
    p2 = np.exp(p1/c10).sum(0) 
    out = np.log(p2).mean() - c50 
    return out 

時序和驗證 -

In [294]: # Setup inputs with m(=D_B.shape[0]) being a large number 
    ...: X = np.random.randint(0,9,(100,10)) 
    ...: Y = np.random.randint(0,9,(10000,10)) 
    ...: sigma = 2.34 
    ...: 

In [295]: np.allclose(loopy_app(X, Y, sigma),vectorized_app(X, Y, sigma)) 
Out[295]: True 

In [296]: %timeit loopy_app(X, Y, sigma) 
1 loops, best of 3: 225 ms per loop 

In [297]: %timeit vectorized_app(X, Y, sigma) 
10 loops, best of 3: 23.6 ms per loop 

In [298]: # Setup inputs with m(=Y.shape[0]) being a much large number 
    ...: X = np.random.randint(0,9,(100,10)) 
    ...: Y = np.random.randint(0,9,(100000,10)) 
    ...: sigma = 2.34 
    ...: 

In [299]: np.allclose(loopy_app(X, Y, sigma),vectorized_app(X, Y, sigma)) 
Out[299]: True 

In [300]: %timeit loopy_app(X, Y, sigma) 
1 loops, best of 3: 2.27 s per loop 

In [301]: %timeit vectorized_app(X, Y, sigma) 
1 loops, best of 3: 243 ms per loop 

約在10x加速!

+0

令人驚歎!超過10倍! – xxx222

+0

@ xxx222你對實際數據集的加速度是多​​少? – Divakar

+0

約20倍左右,因爲我有一個非常大的數據集,所以計算距離矩陣變得困難。 – xxx222