慢數組操作

我的問題可能很簡單，但我不能想出一個辦法，使這個操作更快慢數組操作

print a[(b==c[i]) for i in arange(0,len(c))]

其中a，b和c三個numpy陣列。我正在處理數百萬條記錄的數組，上面的代碼是我程序的瓶頸。

來源

2013-03-29 Brian

要使用什麼比猜測更好的回答這個問題我們需要的，B，C至少形狀 - 向量，矩陣等 – jedwards

一個， b，c是一維數組 – Brian

您的代碼導致語法錯誤。你可以展示一個很慢的代碼的小例子嗎？ –

您是否試圖獲得a的值b==c？

如果是這樣，你可以做a[b==c]：

from numpy import * 

a = arange(11) 
b = 11*a 
c = b[::-1] 

print a  # [ 0 1 2 3 4 5 6 7 8 9 10] 
print b  # [ 0 11 22 33 44 55 66 77 88 99 110] 
print c  # [110 99 88 77 66 55 44 33 22 11 0] 
print a[b==c] # [5]

來源

2013-03-29 17:52:18 tom10

感謝您的答案，但那不是我正在尋找的。在你的例子中，我希望結果是[10 9 8 7 6 5 4 3 2 1 0]，因爲這些是a的值。b = c – Brian

@Matteo：如果b [j] == c [i]對於i或j的多個值？ – tom10

我們假設沒有重複。對不起，我的問題不是很詳細。 – Brian

你應該看看廣播。我假設你正在尋找類似以下的東西？

>>> b=np.arange(5) 
>>> c=np.arange(6).reshape(-1,1) 
>>> b 
array([0, 1, 2, 3, 4]) 
>>> c 
array([[0], 
     [1], 
     [2], 
     [3], 
     [4], 
     [5]]) 
>>> b==c 
array([[ True, False, False, False, False], 
     [False, True, False, False, False], 
     [False, False, True, False, False], 
     [False, False, False, True, False], 
     [False, False, False, False, True], 
     [False, False, False, False, False]], dtype=bool) 
>>> np.any(b==c,axis=1) 
array([ True, True, True, True, True, False], dtype=bool)

那麼對於大數組，你可以嘗試：

import timeit 

s=""" 
import numpy as np 
array_size=500 
a=np.random.randint(500, size=(array_size)) 
b=np.random.randint(500, size=(array_size)) 
c=np.random.randint(500, size=(array_size)) 
""" 

ex1=""" 
a[np.any(b==c.reshape(-1,1),axis=0)] 
""" 

ex2=""" 
a[np.in1d(b,c)] 
""" 

print 'Example 1 took',timeit.timeit(ex1,setup=s,number=100),'seconds.' 
print 'Example 2 took',timeit.timeit(ex2,setup=s,number=100),'seconds.'

當ARRAY_SIZE爲50：

Example 1 took 0.00323104858398 seconds. 
Example 2 took 0.0125901699066 seconds.

當ARRAY_SIZE爲500：

Example 1 took 0.142632007599 seconds. 
Example 2 took 0.0283041000366 seconds.

當ARRAY_SIZE 5000 ：

Example 1 took 16.2110910416 seconds. 
Example 2 took 0.170011043549 seconds.

當ARRAY_SIZE爲50000（數= 5）：

Example 1 took 33.0327301025 seconds. 
Example 2 took 0.0996031761169 seconds.

注我不得不改變其軸線爲np.any（），因此結果將是相同的。爲了達到所需的效果，np.in1d的逆序或np.any的開關軸。您可以從示例1中重塑，但重塑的速度非常快。切換以獲得所需的效果。真的很有趣 - 我將來不得不使用它。

來源

2013-03-29 17:46:14 Daniel

感謝這是我正在尋找，但它仍然非常緩慢的非常大的數組 – Brian

@Matteo：對於一個100萬大小的整數數組，b == c是1萬億字節，所以它可能會有點慢。（不要放下這個答案，但是，Ophion的正確猜測你正在尋找的東西的榮譽和+1！） – tom10

添加了一個更快的方法。 @ tom10當你發佈它時，我90％確定他想要的是你的解決方案:)。 – Daniel

如何np.where()：

>>> a = np.array([2,4,8,16]) 
>>> b = np.array([0,0,0,0]) 
>>> c = np.array([1,0,0,1]) 
>>> bc = np.where(b==c)[0] #indices where b == c 
>>> a[bc] 
array([4,8])

這應該做的伎倆。不知道，如果時機是最適合你的目的

>>> a = np.random.randint(0,10000,1000000) 
>>> b = np.random.randint(0,10000,1000000) 
>>> c = np.random.randint(0,10000,1000000) 
>>> %timeit(a[ np.where(b == c)[0] ] ) 
100 loops, best of 3: 11.3 ms per loop

來源

2013-03-29 23:55:48 dermen

回答

相關問題