2016-09-23 74 views
1

我試圖儘可能快地計算多個3x1向量對的交叉積。這與Einsums的交叉產品

n = 10000 
a = np.random.rand(n, 3) 
b = np.random.rand(n, 3) 
numpy.cross(a, b) 

給出了正確的答案,但this answer to a similar question動機,我認爲einsum會得到我的地方。我發現,無論

eijk = np.zeros((3, 3, 3)) 
eijk[0, 1, 2] = eijk[1, 2, 0] = eijk[2, 0, 1] = 1 
eijk[0, 2, 1] = eijk[2, 1, 0] = eijk[1, 0, 2] = -1 

np.einsum('ijk,aj,ak->ai', eijk, a, b) 
np.einsum('iak,ak->ai', np.einsum('ijk,aj->iak', eijk, a), b) 

計算叉積,但他們的表現是令人失望的:這兩種方法進行多不如np.cross

%timeit np.cross(a, b) 
1000 loops, best of 3: 628 µs per loop 
%timeit np.einsum('ijk,aj,ak->ai', eijk, a, b) 
100 loops, best of 3: 9.02 ms per loop 
%timeit np.einsum('iak,ak->ai', np.einsum('ijk,aj->iak', eijk, a), b) 
100 loops, best of 3: 10.6 ms per loop 

任何如何改進0123的想法s?

回答

2

einsum()乘法運算的次數更多的則是cross(),並在最新的版本與NumPy,cross()不會創建許多臨時陣列。所以einsum()不能比cross()更快。

這裏是交叉的舊代碼:

x = a[1]*b[2] - a[2]*b[1] 
y = a[2]*b[0] - a[0]*b[2] 
z = a[0]*b[1] - a[1]*b[0] 

這裏是交叉的新代碼:

multiply(a1, b2, out=cp0) 
tmp = array(a2 * b1) 
cp0 -= tmp 
multiply(a2, b0, out=cp1) 
multiply(a0, b2, out=tmp) 
cp1 -= tmp 
multiply(a0, b1, out=cp2) 
multiply(a1, b0, out=tmp) 
cp2 -= tmp 

提速它,你需要用Cython或numba。

+0

而且由於這些不涉及循環,所以'cython'的改進可能並不重要。當這樣表達時,「交叉」比陣列更像是一種代數運算。 – hpaulj

2

可以使用np.tensordot失去維度之一在第一層,然後用np.einsum失去了其他維度,像這樣的矩陣乘法帶來 -

np.einsum('aik,ak->ai',np.tensordot(a,eijk,axes=([1],[1])),b) 

或者,我們可以進行廣播的elementwise ab之間的乘法使用np.einsum,然後失去了兩個維度一氣呵成與np.tensordot,像這樣 -

np.tensordot(np.einsum('aj,ak->ajk', a, b),eijk,axes=([1,2],[1,2])) 

我們也可以通過引入新的軸來執行元素乘法運算,例如a[...,None]*b[:,None],但它似乎會減慢它的速度。


雖然,這些都說明了所提出的np.einsum唯一基礎的辦法很好的改善,但拗不過np.cross

運行測試 -

In [26]: # Setup input arrays 
    ...: n = 10000 
    ...: a = np.random.rand(n, 3) 
    ...: b = np.random.rand(n, 3) 
    ...: 

In [27]: # Time already posted approaches 
    ...: %timeit np.cross(a, b) 
    ...: %timeit np.einsum('ijk,aj,ak->ai', eijk, a, b) 
    ...: %timeit np.einsum('iak,ak->ai', np.einsum('ijk,aj->iak', eijk, a), b) 
    ...: 
1000 loops, best of 3: 298 µs per loop 
100 loops, best of 3: 5.29 ms per loop 
100 loops, best of 3: 9 ms per loop 

In [28]: %timeit np.einsum('aik,ak->ai',np.tensordot(a,eijk,axes=([1],[1])),b) 
1000 loops, best of 3: 838 µs per loop 

In [30]: %timeit np.tensordot(np.einsum('aj,ak->ajk',a,b),eijk,axes=([1,2],[1,2])) 
1000 loops, best of 3: 882 µs per loop