有沒有辦法做到一次訪問內存中的每個元素(從而使其時間效率),而無需編寫自定義的ufunc?
是的,這正是numexpr
的設計目的。
import numpy as np
import numexpr as ne
def func1(A, B):
A = A ** 3
A = np.maximum(A, 0.001)
return np.divide(B, A)
def func2(A, B):
return ne.evaluate("B/where(A**3 > 0.001, A**3, 0.001)",
local_dict={'A':A,'B':B})
A, B = np.random.randn(2, 1000, 1000, 3)
print(np.allclose(func1(A, B), func2(A, B)))
# True
numexpr
給出了關於70改進了原來的碼的因素:
In [1]: %%timeit A, B = np.random.randn(2, 1000, 1000, 3)
func1(A, B)
....:
1 loop, best of 3: 837 ms per loop
In [2]: %%timeit A, B = np.random.randn(2, 1000, 1000, 3)
func2(A, B)
....:
The slowest run took 8.87 times longer than the fastest. This could mean that an
intermediate result is being cached.
100 loops, best of 3: 11.5 ms per loop
這部分是因爲numexpr
將使用多個線程在默認情況下計算的,但即使有一個線程它還是擊碎天真矢量:
In [3]: ne.set_num_threads(1)
Out[3]: 8
In [4]: %%timeit A, B = np.random.randn(2, 1000, 1000, 3)
func2(A, B)
....:
10 loops, best of 3: 47.3 ms per loop
數據類型允許使用第三個鏈接執行'np.einsum('ijk,ijk,ijk-> ijk',A,A,A)'來模擬'A *** 3'嗎?這應該是非常有效的。 – Divakar