Python版本的ismember與'行'和索引

類似的問題已經被問到，但沒有一個答案完全符合我所需要的 - 有些允許多維搜索（又名'行'選項在matlab中）但不返回索引。有些返回索引但不允許行。我的數組非常大（1M x 2），並且我已經成功地創建了一個可以工作的循環，但顯然這非常緩慢。在matlab中，內建的ismember函數大約需要10秒。Python版本的ismember與'行'和索引

這裏就是我要找：

a=np.array([[4, 6],[2, 6],[5, 2]]) 

b=np.array([[1, 7],[1, 8],[2, 6],[2, 1],[2, 4],[4, 6],[4, 7],[5, 9],[5, 2],[5, 1]])

確切的MATLAB函數，它的訣竅是：

[~,index] = ismember(a,b,'rows')

其中

index = [6, 3, 9]

來源

2014-03-27 claudiaann1

什麼是你的數組的dtype？是'a'和'b'的長度〜1M嗎？「索引」中的值的順序對您來說很重要嗎？ – unutbu

它們都是dtype（'int64'）。 b是長度〜1M，並且a是長度〜750k。 a中的每個條目都將在b中，但不是相反的。理想情況下，索引輸出將與b的值相同，其值顯示a中的索引。 – claudiaann1

對不起...我的意思是輸出應該和a一樣長。相反，這是沒有意義的。 – claudiaann1

import numpy as np 

def asvoid(arr): 
    """ 
    View the array as dtype np.void (bytes) 
    This views the last axis of ND-arrays as bytes so you can perform comparisons on 
    the entire row. 
    http://stackoverflow.com/a/16840350/190597 (Jaime, 2013-05) 
    Warning: When using asvoid for comparison, note that float zeros may compare UNEQUALLY 
    >>> asvoid([-0.]) == asvoid([0.]) 
    array([False], dtype=bool) 
    """ 
    arr = np.ascontiguousarray(arr) 
    return arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1]))) 


def in1d_index(a, b): 
    voida, voidb = map(asvoid, (a, b)) 
    return np.where(np.in1d(voidb, voida))[0]  

a = np.array([[4, 6],[2, 6],[5, 2]]) 
b = np.array([[1, 7],[1, 8],[2, 6],[2, 1],[2, 4],[4, 6],[4, 7],[5, 9],[5, 2],[5, 1]]) 

print(in1d_index(a, b))

打印

[2 5 8]

這相當於Matlab的[3,6,9]，因爲Python使用基於0的索引。

一些注意事項：

該指數在遞增的順序返回。它們不對應到a的項目b的位置。
asvoid將用於整數dtypes，但要小心如果浮動dtypes使用asvoid ，因爲asvoid([-0.]) == asvoid([0.])返回 array([False])。
asvoid在連續陣列上效果最佳。如果數組不連續，則數據將被複制到連續數組中，這會降低性能。

儘管警告，人們可以選擇使用in1d_index反正速度的原因：（低數千長度的數組）

def ismember_rows(a, b): 
    # http://stackoverflow.com/a/22705773/190597 (ashg) 
    return np.nonzero(np.all(b == a[:,np.newaxis], axis=2))[1] 

In [41]: a2 = np.tile(a,(2000,1)) 
In [42]: b2 = np.tile(b,(2000,1)) 

In [46]: %timeit in1d_index(a2, b2) 
100 loops, best of 3: 8.49 ms per loop 

In [47]: %timeit ismember_rows(a2, b2) 
1 loops, best of 3: 5.55 s per loop

所以in1d_index是〜650X速度更快，但是再次注意，由於in1d_index以遞增順序返回索引，因此ismember_rows返回a的順序行中的索引顯示在b中，所以比較不完全是蘋果對蘋果。

來源

2014-03-27 22:06:25 unutbu

感謝您提出這些建議。我不會有連續的數組，但是它有可能我可以先排序（假設我也可以得到排序的索引，這可能與argsort一起嗎？），那麼我可以做你的建議。 – claudiaann1

你可能想看看'in1d_index'如何執行，而不是先做任何特殊的事情。 'arr = np.ascontiguousarray（arr）'會使輸入不連續。 – unutbu

import numpy as np 
def ismember_rows(a, b): 
    '''Equivalent of 'ismember' from Matlab 
    a.shape = (nRows_a, nCol) 
    b.shape = (nRows_b, nCol) 
    return the idx where b[idx] == a 
    ''' 
    return np.nonzero(np.all(b == a[:,np.newaxis], axis=2))[1] 

a = np.array([[4, 6],[2, 6],[5, 2]]) 
b = np.array([[1, 7],[1, 8],[2, 6],[2, 1],[2, 4],[4, 6],[4, 7],[5, 9],[5, 2],[5, 1]]) 
idx = ismember_rows(a, b) 
print idx 
print np.all(b[idx] == a)

打印

array([5, 2, 8]) 
True

è...我用廣播

------------------------ - [更新] ------------------------------

def ismember(a, b): 
    return np.flatnonzero(np.in1d(b[:,0], a[:,0]) & np.in1d(b[:,1], a[:,1])) 

a = np.array([[4, 6],[2, 6],[5, 2]]) 
b = np.array([[1, 7],[1, 8],[2, 6],[2, 1],[2, 4],[4, 6],[4, 7],[5, 9],[5, 2],[5, 1]]) 
a2 = np.tile(a,(2000,1)) 
b2 = np.tile(b,(2000,1)) 

%timeit timeit in1d_index(a2, b2) 
# 100 loops, best of 3: 8.74 ms per loop 
%timeit ismember(a2, b2) 
# 100 loops, best of 3: 8.5 ms per loop 

np.all(in1d_index(a2, b2) == ismember(a2, b2)) 
# True

正如unutbu所說，指數是按遞增順序返回

來源

2014-03-28 06:32:26 ashg

我試過運行這個，不幸的是它殺死了我的內核。我有非常大的陣列... – claudiaann1

是的，廣播是記憶殺手。對不起。 – ashg

該函數首先將多列元素轉換爲單列數組，然後是numpy。in1d可以用來找出想要的答案，請嘗試下面的代碼：

import numpy as np 

def ismemberRow(A,B): 
    ''' 
    This function is find which rows found in A can be also found in B, 
    The function first turns multiple columns of elements into a single column array, then numpy.in1d can be used 

    Input: m x n numpy array (A), and p x q array (B) 
    Output unique numpy array with length m, storing either True or False, True for rows can be found in both A and B 
    ''' 

    sa = np.chararray((A.shape[0],1)) 
    sa[:] = '-' 
    sb = np.chararray((B.shape[0],1)) 
    sb[:] = '-' 

    ba = (A).astype(np.str) 
    sa2 = np.expand_dims(ba[:,0],axis=1) + sa + np.expand_dims(ba[:,1],axis=1) 
    na = A.shape[1] - 2  

    for i in range(0,na): 
     sa2 = sa2 + sa + np.expand_dims(ba[:,i+2],axis=1) 

    bb = (B).astype(np.str) 
    sb2 = np.expand_dims(bb[:,0],axis=1) + sb + np.expand_dims(bb[:,1],axis=1) 
    nb = B.shape[1] - 2  

    for i in range(0,nb): 
     sb2 = sb2 + sb + np.expand_dims(bb[:,i+2],axis=1) 

    return np.in1d(sa2,sb2) 

A = np.array([[1, 3, 4],[2, 4, 3],[7, 4, 3],[1, 1, 1],[1, 3, 4],[5, 3, 4],[1, 1, 1],[2, 4, 3]]) 

B = np.array([[1, 3, 4],[1, 1, 1]]) 

d = ismemberRow(A,B) 

print A[np.where(d)[0],:] 

#results: 
#[[1 3 4] 
# [1 1 1] 
# [1 3 4] 
# [1 1 1]]

來源

2017-06-13 23:05:24

Python版本的ismember與'行'和索引

回答

相關問題