我爲此爲什麼numpy.array這麼慢？

def main(): 
    for i in xrange(2560000): 
     a = [0.0, 0.0, 0.0] 

main() 

$ time python test.py 

real  0m0.793s

難倒我們現在看到numpy的：

import numpy 

def main(): 
    for i in xrange(2560000): 
     a = numpy.array([0.0, 0.0, 0.0]) 

main() 

$ time python test.py 

real 0m39.338s

神聖的CPU週期蝙蝠俠！

使用numpy.zeros(3)的提高，但仍不足以恕我直言

$ time python test.py 

real 0m5.610s 
user 0m5.449s 
sys 0m0.070s

numpy.version.version =「1.5.1」

如果你如果列表創建是在第一跳過了優化疑惑例如，它不是：

5   19 LOAD_CONST    2 (0.0) 
      22 LOAD_CONST    2 (0.0) 
      25 LOAD_CONST    2 (0.0) 
      28 BUILD_LIST    3 
      31 STORE_FAST    1 (a)

來源

2011-07-02 Stefano Borini

一個簡單的想法：'numpy.array'實際上是一個比列表更復雜的數據結構。在第二個代碼片段中，您創建了一個列表**和**一個numpy數組（僅在第一個列表中）。這是否是造成如此巨大差異的唯一原因，我不能說。 –

@Felix：好的，但創建列表的速度很快，所以即使我在第二種情況下創建了一個列表和一個numpy數組，它仍然是這裏的熱點，並且無論結構有多複雜是的，它仍然是該死的昂貴... –

但是考慮：創建數據很少是應用程序的瓶頸，它使用numpy如此複雜。我不知道幕後會發生什麼，但顯然會在一天結束時讓數學較重的程序更快，所以沒有理由抱怨;） – delnan

Numpy針對大量數據進行了優化。毫不奇怪，給它一個小小的3長陣列，表現不佳。

考慮單獨測試

import timeit 

reps = 100 

pythonTest = timeit.Timer('a = [0.] * 1000000') 
numpyTest = timeit.Timer('a = numpy.zeros(1000000)', setup='import numpy') 
uninitialised = timeit.Timer('a = numpy.empty(1000000)', setup='import numpy') 
# empty simply allocates the memory. Thus the initial contents of the array 
# is random noise 

print 'python list:', pythonTest.timeit(reps), 'seconds' 
print 'numpy array:', numpyTest.timeit(reps), 'seconds' 
print 'uninitialised array:', uninitialised.timeit(reps), 'seconds'

，輸出是

python list: 1.22042918205 seconds 
numpy array: 1.05412316322 seconds 
uninitialised array: 0.0016028881073 seconds

這似乎是它是把所有的時間numpy的數組歸零。所以，除非你需要初始化數組，然後嘗試使用空。

來源

2011-07-02 21:16:43 Dunes

爲了公平，你應該完成'pythonTest = timeit.Timer（'a = [0] * 1000000'）'，它仍然比numpy慢，但它比LC快得多。而且它更接近列表文字（如問題中所述），因爲它不運行Python循環。 –

@Rosh好點。我認爲我總是避開列表的'*'運算符，因爲它在每個索引中放置了相同的對象。雖然在這種情況下數字是不可改變的，這並不重要。儘管嘗試對列表/數組執行批量操作，然後numpy再次領先（例如，arr + = 1）。 – Dunes

非常好的一點，謝謝。考慮到結果，你會對小陣列提出什麼建議？我的意思是，列表和元組對於基本的數組運算（例如向量向量乘積，數列乘法等等，小矩陣的行列式）並不是很好。當然，我可以自己重新實現算法，它不是這是一個很大的問題，但如果已經有這個問題的話，我認爲這是首選解決方案。 –

Holy CPU cycles batman!的確如此。

但請注意考慮一些與numpy非常基本相關的內容;基於複雜線性代數的功能（如random numbers或singular value decomposition）。現在，考慮這些seamingly簡單算了一筆賬：

In []: A= rand(2560000, 3) 
In []: %timeit rand(2560000, 3) 
1 loops, best of 3: 296 ms per loop 
In []: %timeit u, s, v= svd(A, full_matrices= False) 
1 loops, best of 3: 571 ms per loop

，並請相信我，這樣的表現也不會顯著當前可用的任何包毆打。

所以，請描述您的真實問題，我會盡力找出適合它的體面的numpy解決方案。

更新：
下面是一些簡單的雷球路口代碼：

import numpy as np 

def mag(X): 
    # magnitude 
    return (X** 2).sum(0)** .5 

def closest(R, c): 
    # closest point on ray to center and its distance 
    P= np.dot(c.T, R)* R 
    return P, mag(P- c) 

def intersect(R, P, h, r): 
    # intersection of rays and sphere 
    return P- (h* (2* r- h))** .5* R 

# set up 
c, r= np.array([10, 10, 10])[:, None], 2. # center, radius 
n= 5e5 
R= np.random.rand(3, n) # some random rays in first octant 
R= R/ mag(R) # normalized to unit length 

# find rays which will intersect sphere 
P, b= closest(R, c) 
wi= b<= r 

# and for those which will, find the intersection 
X= intersect(R[:, wi], P[:, wi], r- b[wi], r)

顯然，我們正確地計算出：

In []: allclose(mag(X- c), r) 
Out[]: True

和一些計時：

In []: % timeit P, b= closest(R, c) 
10 loops, best of 3: 93.4 ms per loop 
In []: n/ 0.0934 
Out[]: 5353319 #=> more than 5 million detection's of possible intersections/ s 
In []: %timeit X= intersect(R[:, wi], P[:, wi], r- b[wi]) 
10 loops, best of 3: 32.7 ms per loop 
In []: X.shape[1]/ 0.0327 
Out[]: 874037 #=> almost 1 million actual intersections/ s

這些時間是用非常適中的機器完成的。使用現代化的機器，仍然可以期待顯着的加速。

無論如何，這只是一個簡短的演示如何使用numpy進行編碼。

來源

2011-07-02 23:03:52 eat

我真正的問題：http://stackoverflow.com/questions/6528214/improving-performance-of-raytracing-hit-function –

@Stefano Borini：更新了我的答案。謝謝 – eat

不錯。但是，它並不能真正讓你直接用這種方式處理Sphere對象。你必須有一個後端把高層次的設計轉換成一組聚集的座標，然後將其輸入到numpy中。 –

最新回答，但對其他觀衆可能很重要。

此問題也已在kwant項目中考慮過。事實上，小數組並沒有在numpy中進行優化，相當頻繁的小數組正是您所需要的。

在這方面，他們創建了一個與numpy數組行爲並存的小型數組的替代品（新數據類型中的任何未實現的操作都由numpy處理）。

你應該看看這個項目：
https://pypi.python.org/pypi/tinyarray/1.0.5
其主要目的是爲了很好地表現爲小數組。當然，你可以用numpy做的一些更奇特的事情是不支持的。但數字似乎是你的要求。

我已經取得了一些小測試：

蟒蛇

我已經加入numpy的進口，以獲得加載時間正確

import numpy 

def main(): 
    for i in xrange(2560000): 
     a = [0.0, 0.0, 0.0] 

main()

numpy的

import numpy 

def main(): 
    for i in xrange(2560000): 
     a = numpy.array([0.0, 0.0, 0.0]) 

main()

numpy的零

import numpy 

def main(): 
    for i in xrange(2560000): 
     a = numpy.zeros((3,1)) 

main()

tinyarray

import numpy,tinyarray 

def main(): 
    for i in xrange(2560000): 
     a = tinyarray.array([0.0, 0.0, 0.0]) 

main()

tinyarray零

import numpy,tinyarray 

def main(): 
    for i in xrange(2560000): 
     a = tinyarray.zeros((3,1)) 

main()

我跑了這一點：

for f in python numpy numpy_zero tiny tiny_zero ; do 
    echo $f 
    for i in `seq 5` ; do 
     time python ${f}_test.py 
    done 
done

，並得到：

python 
python ${f}_test.py 0.31s user 0.02s system 99% cpu 0.339 total 
python ${f}_test.py 0.29s user 0.03s system 98% cpu 0.328 total 
python ${f}_test.py 0.33s user 0.01s system 98% cpu 0.345 total 
python ${f}_test.py 0.31s user 0.01s system 98% cpu 0.325 total 
python ${f}_test.py 0.32s user 0.00s system 98% cpu 0.326 total 
numpy 
python ${f}_test.py 2.79s user 0.01s system 99% cpu 2.812 total 
python ${f}_test.py 2.80s user 0.02s system 99% cpu 2.832 total 
python ${f}_test.py 3.01s user 0.02s system 99% cpu 3.033 total 
python ${f}_test.py 2.99s user 0.01s system 99% cpu 3.012 total 
python ${f}_test.py 3.20s user 0.01s system 99% cpu 3.221 total 
numpy_zero 
python ${f}_test.py 1.04s user 0.02s system 99% cpu 1.075 total 
python ${f}_test.py 1.08s user 0.02s system 99% cpu 1.106 total 
python ${f}_test.py 1.04s user 0.02s system 99% cpu 1.065 total 
python ${f}_test.py 1.03s user 0.02s system 99% cpu 1.059 total 
python ${f}_test.py 1.05s user 0.01s system 99% cpu 1.064 total 
tiny 
python ${f}_test.py 0.93s user 0.02s system 99% cpu 0.955 total 
python ${f}_test.py 0.98s user 0.01s system 99% cpu 0.993 total 
python ${f}_test.py 0.93s user 0.02s system 99% cpu 0.953 total 
python ${f}_test.py 0.92s user 0.02s system 99% cpu 0.944 total 
python ${f}_test.py 0.96s user 0.01s system 99% cpu 0.978 total 
tiny_zero 
python ${f}_test.py 0.71s user 0.03s system 99% cpu 0.739 total 
python ${f}_test.py 0.68s user 0.02s system 99% cpu 0.711 total 
python ${f}_test.py 0.70s user 0.01s system 99% cpu 0.721 total 
python ${f}_test.py 0.70s user 0.02s system 99% cpu 0.721 total 
python ${f}_test.py 0.67s user 0.01s system 99% cpu 0.687 total

現在這些測試（如已經指出的）不是最好的測試。但是，他們仍然表明，微陣列更適合小陣列。
另一個事實是，在微陣列中最常見的操作應該更快。因此，它可能比使用數據創建有更好的使用效果。

我從來沒有嘗試過在一個完全成熟的項目，但kwant項目使用它

來源

2014-04-28 02:51:51 zeroth

，如果某個'numpy'函數產生了太多的開銷，有時可以通過一個函數來服從它，而不是在模塊中查找它，例如'd = numpy.array; a = d（[0.0。0。]）'。 – zeroth

當然numpy的在這種情況下消耗更多的時間，因爲：a = np.array([0.0, 0.0, 0.0]) < =〜=>a = [0.0, 0.0, 0.0]; a = np.array(a)，它走了兩步。但是numpy陣列有很多優點，它的高速可以在它們的操作中看到，而不是創建它們。我個人想法的一部分:)。

來源

2017-12-08 11:43:14 ZhengPeng

爲什麼numpy.array這麼慢？

回答

蟒蛇

numpy的

numpy的零

tinyarray

tinyarray零

相關問題