加速結構化NumPy陣列

NumPy陣列非常適合性能和易用性（更容易切片，索引比列表）。加速結構化NumPy陣列

我嘗試構建一個NumPy structured array而不是dict的NumPy arrays的數據容器。問題是性能差得多。使用同類數據約2.5倍，異構數據約32倍（我正在談論NumPy數據類型）。

有沒有辦法加快結構化陣列的速度？我嘗試將記憶順序從'c'更改爲'f'，但這沒有任何影響。

這裏是我的分析代碼：

import time 
import numpy as np 

NP_SIZE = 100000 
N_REP = 100 

np_homo = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.double)], order='c') 
np_hetro = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.int32)], order='c') 
dict_homo = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE)} 
dict_hetro = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE, np.int32)} 

t0 = time.time() 
for i in range(N_REP): 
    np_homo['a'] += i 

t1 = time.time() 
for i in range(N_REP): 
    np_hetro['a'] += i 

t2 = time.time() 
for i in range(N_REP): 
    dict_homo['a'] += i 

t3 = time.time() 
for i in range(N_REP): 
    dict_hetro['a'] += i 
t4 = time.time() 

print('Homogeneous Numpy struct array took {:.4f}s'.format(t1 - t0)) 
print('Hetoregeneous Numpy struct array took {:.4f}s'.format(t2 - t1)) 
print('Homogeneous Dict of numpy arrays took {:.4f}s'.format(t3 - t2)) 
print('Hetoregeneous Dict of numpy arrays took {:.4f}s'.format(t4 - t3))

編輯：忘了把我的時間數字：

Homogenious Numpy struct array took 0.0101s 
Hetoregenious Numpy struct array took 0.1367s 
Homogenious Dict of numpy arrays took 0.0042s 
Hetoregenious Dict of numpy arrays took 0.0042s

EDIT2：我添加了一些額外的測試案例與TIMIT模塊：

import numpy as np 
import timeit 

NP_SIZE = 1000000 

def time(data, txt, n_rep=1000): 
    def intern(): 
     data['a'] += 1 

    time = timeit.timeit(intern, number=n_rep) 
    print('{} {:.4f}'.format(txt, time)) 


np_homo = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.double)], order='c') 
np_hetro = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.int32)], order='c') 
dict_homo = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE)} 
dict_hetro = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE, np.int32)} 

time(np_homo, 'Homogeneous Numpy struct array') 
time(np_hetro, 'Hetoregeneous Numpy struct array') 
time(dict_homo, 'Homogeneous Dict of numpy arrays') 
time(dict_hetro, 'Hetoregeneous Dict of numpy arrays')

結果於：

Homogeneous Numpy struct array 0.7989 
Hetoregeneous Numpy struct array 13.5253 
Homogeneous Dict of numpy arrays 0.3750 
Hetoregeneous Dict of numpy arrays 0.3744

運行之間的比例似乎相當穩定。使用這兩種方法和不同大小的數組。

對於offcase它的問題：蟒蛇：3.4 NumPy的：1.9.2

來源

2016-01-21 magu_

由於這個問題是關於NumPy的一個特定性能問題，而不是一般性的批評，因此它已經從Code Review遷移到Stack Overflow。 –

如果你真的想使用結構化數組，我會建議嘗試[pandas]（http://pandas.pydata.org/）。 –

看到這個問題：https://github.com/numpy/numpy/issues/6467 – MaxNoe

在我的快速計時測試不同的是沒有那麼大：

In [717]: dict_homo = {'a': np.zeros(10000), 'b': np.zeros(10000)} 
In [718]: timeit dict_homo['a']+=1 
10000 loops, best of 3: 25.9 µs per loop 
In [719]: np_homo = np.zeros(10000, dtype=[('a', np.double), ('b', np.double)]) 
In [720]: timeit np_homo['a'] += 1 
10000 loops, best of 3: 29.3 µs per loop

在dict_homo的情況下，事實上，數組嵌入字典是一個小問題。這種簡單的字典訪問速度很快，基本上與通過變量名訪問數組的方式相同。

因此，第一種情況下，它基本上是對1d陣列的測試+=。

在結構化案例中，a和b值在數據緩衝區中交替出現，因此np_homo['a']是一種「拉出」替代數字的視圖。所以它會慢一點就不奇怪了。

In [721]: np_homo 
Out[721]: 
array([(41111.0, 0.0), (41111.0, 0.0), (41111.0, 0.0), ..., (41111.0, 0.0), 
     (41111.0, 0.0), (41111.0, 0.0)], 
     dtype=[('a', '<f8'), ('b', '<f8')])

二維數組也交錯列值。

In [722]: np_twod=np.zeros((10000,2), np.double) 
In [723]: timeit np_twod[:,0]+=1 
10000 loops, best of 3: 36.8 µs per loop

令人驚訝的是，它實際上比結構化案例慢一點。使用order='F'或（2,10000）形狀可以加快速度，但仍然不如結構化的情況。

這些都是小測試時間，所以我不會做出重大索賠。但是結構化數組不會回頭看。

另一次試驗中，初始化所述陣列或字典新鮮每個步驟

In [730]: %%timeit np.twod=np.zeros((10000,2), np.double) 
np.twod[:,0] += 1 
    .....: 
10000 loops, best of 3: 36.7 µs per loop 
In [731]: %%timeit np_homo = np.zeros(10000, dtype=[('a', np.double), ('b', np.double)]) 
np_homo['a'] += 1 
    .....: 
10000 loops, best of 3: 38.3 µs per loop 
In [732]: %%timeit dict_homo = {'a': np.zeros(10000), 'b': np.zeros(10000)} 
dict_homo['a'] += 1 
    .....: 
10000 loops, best of 3: 25.4 µs per loop

2d和結構更接近，具有稍微更好的性能的字典（1d）的情況下。我也嘗試過np.ones，因爲np.zeros可以延遲分配，但行爲沒有差異。

來源

2016-01-21 20:46:20 hpaulj

嗯。那很有意思。特別是第一個結果。你是否嘗試增加元素的大小？只是爲了確保所需的時間不受某些常數的支配。 –

加速結構化NumPy陣列

回答

相關問題