使用numpy矩陣加速計算

我有兩個矩陣。兩者都充滿了零和一個。一個是大的（3000 x 2000元素），另一個是較小的（20 x 20）元素。我正在做這樣的事情：使用numpy矩陣加速計算

newMatrix = (size of bigMatrix), filled with zeros 
l = (a constant) 

for y in xrange(0, len(bigMatrix[0])): 
    for x in xrange(0, len(bigMatrix)): 

     for b in xrange(0, len(smallMatrix[0])): 
      for a in xrange(0, len(smallMatrix)): 

       if (bigMatrix[x, y] == smallMatrix[x + a - l, y + b - l]): 
        newMatrix[x, y] = 1

這是痛苦的緩慢。我做錯了什麼？有沒有一種巧妙的方法可以使這項工作更快？

編輯：基本上我是，對於大矩陣中的每個（x，y），檢查大矩陣和（x，y）周圍的小矩陣的所有像素以查看它們是否爲1。 1，然後我在newMatrix上設置該值。我正在做一種碰撞檢測。

來源

2009-11-26 devoured elysium

你確實有這樣的表達： smallMatrix [X + A - 1， y + b - l]）當您使用「大矩陣索引」x，y來處理小矩陣上的元素時，這是正確的嗎？ – jsbueno

這是正確的。 –

我可以考慮一些優化 - 由於您使用4個嵌套python「for」語句，因此您的速度可能會很慢。

我無法弄清楚你正在尋找什麼 - 但是，首先，如果你的大矩陣「1」的密度很低，你肯定可以在bigMtarix的slice上使用python的「any」函數來快速檢查如果有任何一組元素有 - 你可以得到有數倍的速度增加：

step = len(smallMatrix[0]) 
for y in xrange(0, len(bigMatrix[0], step)): 
    for x in xrange(0, len(bigMatrix), step): 
     if not any(bigMatrix[x: x+step, y: y + step]): 
      continue 
     (...)

在這一點上，如果還需要對每個元素進行交互，你做的另外一對指標走各的位置在步驟內 - 但我認爲你有這個想法。

除了像使用「任何」用法一樣使用內部數值操作外，當找到第一個匹配像素時，您肯定可以添加一些控制流代碼來中斷（b，a）循環。（例如，在最後一個「if」中插入一個「break」語句，並在另一個if..break對中爲「b」循環插入

我真的無法弄清楚你的意圖是什麼 - 所以我可以給你更多的特定代碼

來源

2009-11-26 13:36:49 jsbueno

我不知道如何忘記你在最後一段提到的那個突破想法。但我需要打破2個循環。有沒有辦法在Python中做到這一點，而不必打破內部循環，並且必須使用標誌來檢查每當我打破外部循環時？ –

你的示例代碼沒有任何意義，但是你的問題的描述聽起來像是你正在嘗試在一個大的數組中進行一個小的數組的2d卷積，在scipy中有一個convolve2d函數.signal包完全做到這一點，只需做convolve2d(bigMatrix, smallMatrix)就可以得到結果。不幸的是，scipy實現沒有布爾型數組的特殊情況，所以完整的卷積非常慢。下面是一個函數，它利用了數組只包含1和0：

import numpy as np 

def sparse_convolve_of_bools(a, b): 
    if a.size < b.size: 
     a, b = b, a 
    offsets = zip(*np.nonzero(b)) 
    n = len(offsets) 
    dtype = np.byte if n < 128 else np.short if n < 32768 else np.int 
    result = np.zeros(np.array(a.shape) + b.shape - (1,1), dtype=dtype) 
    for o in offsets: 
     result[o[0]:o[0] + a.shape[0], o[1]:o[1] + a.shape[1]] += a 
    return result

在我的機器上，它對於3000x2000的20x20卷積運行時間少於9秒。運行時間取決於較小陣列中的數量，每個非零元素爲20毫秒。

來源

2009-11-26 14:59:43

我沒有得到那些代碼的作用，但我試過了，它似乎崩潰說結果= numpy.zeros（a.shape）+ b.shape - （1,1），dtype = dtype ） ValueError：shape mismatch：objects can be broadcast to a single shape –

嘗試用替代它 result = np.zeros（（a.shape [0] + b.shape [0] - 1，a.shape [1 ] + b.shape [1] - 1，dtype = dtype） – dwf

如果您是位真正打包每個字節三十二分之八每INT，並且可以減少你smallMatrix到20x16，
然後嘗試以下方法，這裏單行。
（newMatrix[x, y] = 1當任何位x周圍的20x16的，y爲1 ?? 什麼是你真正需要的？）

python -m timeit -s ' 
""" slide 16-bit mask across 32-bit pairs bits[j], bits[j+1] """ 

import numpy as np 

bits = np.zeros(2000 // 16, np.uint16) # 2000 bits 
bits[::8] = 1 
mask = 32+16 
nhit = 16 * [0] 

def hit16(bits, mask, nhit): 
    """ 
     slide 16-bit mask across 32-bit pairs bits[j], bits[j+1] 
     bits: long np.array(uint16) 
     mask: 16 bits, int 
     out: nhit[j] += 1 where pair & mask != 0 
    """ 
    left = bits[0] 
    for b in bits[1:]: 
     pair = (left << 16) | b 
     if pair: # np idiom for non-0 words ? 
      m = mask 
      for j in range(16): 
       if pair & m: 
        nhit[j] += 1 
        # hitposition = jb*16 + j 
       m <<= 1 
     left = b 
    # if any(nhit): print "hit16:", nhit 

' \ 
' 
hit16(bits, mask, nhit) 
' 

# 15 msec per loop, bits[::4] = 1 
# 11 msec per loop, bits[::8] = 1 
# mac g4 ppc

來源

2009-12-04 15:39:12 denis

使用numpy矩陣加速計算

回答

相關問題