用numpy高效獲得正值對

我有一個python函數，它有兩個列表，在兩個輸入中查找對，其中兩個輸入都具有相同索引處的正值，並通過追加到每個列表創建兩個輸出列表這兩個正面價值。我有一個工作功能：用numpy高效獲得正值對

def get_pairs_in_first_quadrant(x_in, y_in): 
    """If both x_in[i] and y_in[i] are > 0 then both will appended to the output list. If either are negative 
    then the pair of them will be absent from the output list. 
    :param x_in: A list of positive or negative floats 
    :param y_in: A list of positive or negative floats 
    :return: A list of positive floats <= in length to the inputs. 
    """ 
    x_filtered, y_filtered = [], [] 
    for x, y in zip(x_in, y_in): 
     if x > 0 and y > 0: 
      x_filtered.append(x) 
      y_filtered.append(y) 
    return x_filtered, y_filtered

我怎樣才能讓這個更快使用numpy？

來源

2015-05-11 ayeayeron

使用[numpy.logical_and]（http://docs.scipy.org/doc/numpy/reference/routines .logic.html）。 –

我們在這裏討論的名單有多大？ – koukouviou

長度可能在100000左右。 – ayeayeron

您可以通過簡單的查找，他們都爲正的指數做到這一點：

import numpy as np 

a = np.random.random(10) - .5 
b = np.random.random(10) - .5 

def get_pairs_in_first_quadrant(x_in, y_in): 
    i = np.nonzero((x_in>0) & (y_in>0)) # main line of interest 
    return x_in[i], y_in[i] 

print a # [-0.18012451 -0.40924713 -0.3788772 0.3186816 0.14811581 -0.04021951 -0.21278312 -0.36762629 -0.45369899 -0.46374929] 
print b # [ 0.33005969 -0.03167875 0.11387641 0.22101336 0.38412264 -0.3880842 0.08679424 0.3126209 -0.08760505 -0.40921421] 
print get_pairs_in_first_quadrant(a, b) # (array([ 0.3186816 , 0.14811581]), array([ 0.22101336, 0.38412264]))

我很感興趣，海梅的建議，只是用布爾索引，而不調用 nonzero所以我跑了一些時間測試。結果有些有趣，因爲它們的優勢比與正匹配的數量是非單調的，但基本上，至少對於速度而言，使用哪一個並不重要（儘管 nonzero通常更快一些，並且可以是關於快兩倍）：

threshold = .6 
a = np.random.random(10000) - threshold 
b = np.random.random(10000) - threshold 

def f1(x_in, y_in): 
    i = np.nonzero((x_in>0) & (y_in>0)) # main line of interest 
    return x_in[i], y_in[i] 

def f2(x_in, y_in): 
    i = (x_in>0) & (y_in>0) # main line of interest 
    return x_in[i], y_in[i] 

print threshold, len(f1(a,b)[0]), len(f2(a,b)[0]) 
print timeit("f1(a, b)", "from __main__ import a, b, f1, f2", number = 1000) 
print timeit("f2(a, b)", "from __main__ import a, b, f1, f2", number = 1000)

其中給出，對於不同的閾值：

0.05 9086 9086 
0.0815141201019 
0.104746818542 

0.5 2535 2535 
0.0715141296387 
0.153401851654 

0.95 21 21 
0.027126789093 
0.0324990749359

來源

2015-05-11 02:00:54 tom10

不要在布爾數組上調用'np.nonzero'：直接使用它來索引'x_in'和'y_in'。 – Jaime

@Jaime：我進行了速度測試，似乎沒有太大的區別。是否有另一個更喜歡布爾數組的理由？ – tom10

你使用numpy 1.9嗎？這個版本的索引有了很大的改進，看起來我的直覺已經過時了......如果你要索引一個數組，布爾值仍然會稍微提前，但顯然不適用於更多的數組。 – Jaime

用numpy高效獲得正值對

回答

相關問題