2015-06-27 46 views
8

對於二維索引隨機數組中的每個元素(可能有重複數據),我希望對二維零數組中的對應網格「+ = 1」。但是,我不知道如何優化計算。使用標準的for循環,如下圖所示,在NumPy數組中向量化迭代加法

def interadd(): 
    U = 100 
    input = np.random.random(size=(5000,2)) * U 
    idx = np.floor(input).astype(np.int) 

    grids = np.zeros((U,U))  
    for i in range(len(input)): 
     grids[idx[i,0],idx[i,1]] += 1 
    return grids 

運行時可以說是相當顯著:

>> timeit(interadd, number=5000) 
43.69953393936157 

有沒有辦法向量化這個反覆的過程?

回答

8

您可以通過使用np.add.at,其正確處理重複的指標的情況下,加速它一點:這給

>>> U = 100 
>>> idx = np.floor(np.random.random(size=(5000,2))*U).astype(np.int) 
>>> (interadd(U, idx) == interadd2(U, idx)).all() 
True 
>>> %timeit interadd(U, idx) 
100 loops, best of 3: 8.48 ms per loop 
>>> %timeit interadd2(U, idx) 
100 loops, best of 3: 2.62 ms per loop 

而且YXD的建議

def interadd(U, idx): 
    grids = np.zeros((U,U))  
    for i in range(len(idx)): 
     grids[idx[i,0],idx[i,1]] += 1 
    return grids 

def interadd2(U, idx): 
    grids = np.zeros((U,U)) 
    np.add.at(grids, idx.T.tolist(), 1) 
    return grids 

def interadd3(U, idx): 
    # YXD suggestion 
    grids = np.zeros((U,U)) 
    np.add.at(grids, (idx[:,0], idx[:,1]), 1) 
    return grids 

>>> (interadd(U, idx) == interadd3(U, idx)).all() 
True 
>>> %timeit interadd3(U, idx) 
1000 loops, best of 3: 1.09 ms per loop 
+4

打字幾乎thing.You可以改變'IDX相同。 T.tolist()'到'(idx [:,0],idx [:,1])'應該更快。 – YXD

+1

(錯字剛剛在上面的註釋中更正) – YXD

5

您可以將R,C索引從idx轉換爲線性索引,然後找出唯一的索引以及它們的計數,最後將它們作爲最終輸出存儲在輸出grids中。下面就來達到同樣的實施 -

# Calculate linear indices corressponding to idx 
lin_idx = idx[:,0]*U + idx[:,1] 

# Get unique linear indices and their counts 
unq_lin_idx,idx_counts = np.unique(lin_idx,return_counts=True) 

# Setup output array and store index counts in raveled/flattened version 
grids = np.zeros((U,U)) 
grids.ravel()[unq_lin_idx] = idx_counts 

運行測試 -

下面是運行測試覆蓋了所有的辦法(包括@DSM's approaches),並使用相同的定義是解決上市公司 -

In [63]: U = 100 
    ...: idx = np.floor(np.random.random(size=(5000,2))*U).astype(np.int) 
    ...: 

In [64]: %timeit interadd(U, idx) 
100 loops, best of 3: 7.57 ms per loop 

In [65]: %timeit interadd2(U, idx) 
100 loops, best of 3: 2.59 ms per loop 

In [66]: %timeit interadd3(U, idx) 
1000 loops, best of 3: 1.24 ms per loop 

In [67]: def unique_counts(U, idx): 
    ...:  lin_idx = idx[:,0]*U + idx[:,1] 
    ...:  unq_lin_idx,idx_counts = np.unique(lin_idx,return_counts=True) 
    ...:  grids = np.zeros((U,U)) 
    ...:  grids.ravel()[unq_lin_idx] = idx_counts 
    ...:  return grids 
    ...: 

In [68]: %timeit unique_counts(U, idx) 
1000 loops, best of 3: 595 µs per loop 

運行時間表明建議的基於np.unique的方法比第二快的方法快50%以上。

+0

'np.unique'在引擎蓋下使用排序,所以比'np.add.at'的時間複雜度要差,但另一方面,您的方法有更快的內存訪問模式'網格'數組。 –

+0

@moarningsun是的,我認爲它在引擎蓋下使用了「排序」和「分化」。這是有道理的,我猜想在更快的運行時間。用'add.at'找出下面的內容會很有趣。 – Divakar

+0

這讓我想到了一個有趣的方法:'grids = np.bincount(lin_idx,minlength = U ** 2).reshape(U,U)' –

6

Divakar的回答使我嘗試以下,這看起來是最快的方法尚未:

lin_idx = idx[:,0]*U + idx[:,1] 
grids = np.bincount(lin_idx, minlength=U**2).reshape(U, U) 

時序:

In [184]: U = 100 
    ...: input = np.random.random(size=(5000,2)) * U 
    ...: idx = np.floor(input).astype(np.int) 

In [185]: %timeit interadd3(U, idx) # By DSM/XYD 
1000 loops, best of 3: 1.68 ms per loop 

In [186]: %timeit unique_counts(U, idx) # By Divakar 
1000 loops, best of 3: 676 µs per loop 

In [187]: %%timeit 
    ...: lin_idx = idx[:,0]*U + idx[:,1] 
    ...: grids = np.bincount(lin_idx, minlength=U*U).reshape(U, U) 
    ...: 
10000 loops, best of 3: 97.5 µs per loop 
+0

巨大的改進似乎! – Divakar