1
我有以下numpy的化合物的數據類型:在numpy化合物數據集中填充值很慢;爲什麼?
mytype = numpy.dtype([('x', 'f8'),
('y', 'f8'),
('z', 'f8'))])
然而,當我嘗試以填充這種類型的,它60X慢於三個單獨的陣列的載體:
#!/usr/bin/env python3
import time
import random
import numpy
mytype = numpy.dtype([('x', 'f8'),
('y', 'f8'),
('z', 'f8')])
size = 1000000
v = numpy.empty(shape=(size,), dtype=mytype)
print("Start inserting into compound type:")
start = time.time()
for i in range(size):
v[i]['x'] = random.random()
v[i]['y'] = random.random()
v[i]['z'] = random.random()
end = time.time()
print("Done inserting into compound type: Time elapsed: {}.\n".format(end - start))
x = numpy.empty(shape=(size,), dtype='f8')
y = numpy.empty(shape=(size,), dtype='f8')
z = numpy.empty(shape=(size,), dtype='f8')
print("Inserting into three arrays:")
start = time.time()
for i in range(size):
x[i] = random.random()
y[i] = random.random()
z[i] = random.random()
end = time.time()
print("Done inserting into three arrays. Time elapsed: {}".format(end - start))
print("Reading from compound type:")
start = time.time()
for i in range(size):
x1 = v[i]['x']
y1 = v[i]['y']
z1 = v[i]['z']
end = time.time()
print("Done reading compound type: Time elapsed: {}.\n".format(end -start))
print("Reading from three arrays:")
start = time.time()
for i in range(size):
x1 = x[i]
y1 = y[i]
z1 = z[i]
end = time.time()
print("Done reading three arrays. Time elapsed: {}.\n".format(end - start))
此外,我發現讀取numpy複合數據類型比相應的分隔數據類型慢70倍。我如何提高numpy複合數據類型的性能?
編輯:從主人克隆numpy後,此性能錯誤消失。
我想我選擇了一個過於對稱的例子,因爲IRL我的化合物數據集確實有不同的類型。然而,我期望C風格結構的向量與多個數組一樣快,那麼爲什麼numpy不同? – user14717
它不使用'c'結構,至少不是直接使用。數據存儲可能很緊湊(一個簡單的字節數組),但它仍然需要將數據移入和移出Python對象(包括元組)。我的猜測是,對於更一般的dtype,更多的處理必須在Python級別,而在編譯的代碼中則更少。 – hpaulj
嗯,它看起來像'numpy.dtype'有一些支持指定字節偏移量。 。 。我會試一試。 。 。 – user14717