2D numpy數組搜索（相當於Matlab的相交'行'選項）

我有兩個4列numpy數組（2D），每個數組有100個（浮點）行（cap和usp）。考慮每個陣列中3列的子集（例如capind=cap[:,:3]）：2D numpy數組搜索（相當於Matlab的相交'行'選項）

在兩個陣列之間有許多共同的行。
每一行元組/「三元組」在每個數組中都是唯一的。

我正在尋找一種有效的方法來識別這兩個數組中的常見三個值（行）子集，同時以某種方式保留兩個數組中的第四列以供進一步處理。實質上，我正在尋找一種很好的方式來做與Matlab行相同的行選項（即([c, ia, ib]=intersect(capind, uspind, 'rows');)。

），它返回匹配行的索引，以便現在獲得匹配的三元組以及從原來的陣列（matchcap=cap[ia,:]）第4列的值。

我目前的做法是基於在論壇上類似的問題，因爲我找不到我的問題一個很好的匹配。但是，這種方法似乎有點考慮到我的目標沒有效率（我還沒有完全解決我的問題）：

該陣列是這樣的：

cap=array([[ 2.50000000e+01, 1.27000000e+02, 1.00000000e+00, 
     9.81997200e-06], 
    [ 2.60000000e+01, 1.27000000e+02, 1.00000000e+00, 
     9.14296800e+00], 
    [ 2.70000000e+01, 1.27000000e+02, 1.00000000e+00, 
     2.30137100e-04], 
    ..., 
    [ 6.10000000e+01, 1.80000000e+02, 1.06000000e+02, 
     8.44939900e-03], 
    [ 6.20000000e+01, 1.80000000e+02, 1.06000000e+02, 
     4.77729100e-03], 
    [ 6.30000000e+01, 1.80000000e+02, 1.06000000e+02, 
     1.40343500e-03]]) 

usp=array([[ 4.10000000e+01, 1.31000000e+02, 1.00000000e+00, 
     5.24197200e-06], 
    [ 4.20000000e+01, 1.31000000e+02, 1.00000000e+00, 
     8.39178800e-04], 
    [ 4.30000000e+01, 1.31000000e+02, 1.00000000e+00, 
     1.20279900e+01], 
    ..., 
    [ 4.70000000e+01, 1.80000000e+02, 1.06000000e+02, 
     2.48667700e-02], 
    [ 4.80000000e+01, 1.80000000e+02, 1.06000000e+02, 
     4.23304600e-03], 
    [ 4.90000000e+01, 1.80000000e+02, 1.06000000e+02, 
     1.02051300e-03]])

我然後每4列陣列（USP和帽）轉換成一個三列的陣列（capind和下面uspind示出爲爲了便於觀察的整數）。

capind=array([[ 25, 127, 1], 
    [ 26, 127, 1], 
    [ 27, 127, 1], 
    ..., 
    [ 61, 180, 106], 
    [ 62, 180, 106], 
    [ 63, 180, 106]]) 
uspind=array([[ 41, 131, 1], 
    [ 42, 131, 1], 
    [ 43, 131, 1], 
    ..., 
    [ 47, 180, 106], 
    [ 48, 180, 106], 
    [ 49, 180, 106]])

使用set操作給我匹配的三元組：carray=np.array([x for x in set(tuple(x) for x in capind) & set(tuple(x) for x in uspind)])。

這似乎很適合從uspind和capind數組中找到常見行值。我現在需要從匹配的行中獲取第4列的值（即，將carray與原始源數組的前三列（cap和usp）進行比較，並以某種方式從第4列中獲取值）。

有沒有更好的方法來實現這一目標？否則，任何有關從源數組中檢索第四列值的最佳方法的幫助將不勝感激。

來源

2014-06-10 ith140

請嘗試使用詞典。

capind = {tuple(row[:3]):row[3] for row in cap} 
uspind = {tuple(row[:3]):row[3] for row in usp} 

keys = capind.viewkeys() & uspind.viewkeys() 
for key in keys: 
    # capind[key] and uspind[key] are the fourth columns

來源

2014-06-10 15:51:58 nneonneo

這幾乎是有一個小correction.'capind = {元組（行[3]）：行[3]行中cap} uspind = {tuple（row [：3]）：row [3] for usp}} – ith140

我想保留數組結構，因爲我不想遍歷字典。我需要稍後對cap和usp中的常見元素進行一些數組運算。 – ith140

你可以讓他們回到事後陣列... – nneonneo

使用假設你已經知道行在每個矩陣中是唯一的，並且存在公共行，這裏有一個解決方案。基本的想法是連接兩個數組，對它進行排序，使相似的行在一起，然後在行之間做出改變。如果行相同，前三個值應接近於零。

[原文]

## Concatenate the matrices together 
cu = concatenate((cap, usp), axis=0) 
print cu 

## Sort it 
cu.sort(axis=0) 
print cu 

## Do a forward difference from row to row 
cu_diff = diff(cu, n=1, axis=0) 

## Now calculate the sum of the first three columns 
## as it should be zero (or near zero) 
cu_diff_s = sum(abs(cu_diff[:,:-1]), axis=1) 

## Find the indices where it is zero 
## Change this to be <= eps if you are using float numbers 
indices = find(cu_diff_s == 0) 
print indices 

## And here are the rows... 
print cu[indices,:]

我做作基於上面的例子的數據集。它似乎工作。可能有更快的方法來做到這一點，但這樣你就不必循環任何東西。（我不喜歡循環:-)）。

[已更新]

好的。所以我在每個矩陣中增加了兩列。最後一列是帽子1和USP 2。最後一列僅僅是原始矩陣的索引。

## Store more info in the array 
## The first 4 columns are the initial data 
## The fifth column is a code of 1 or 2 (ie cap or usp) 
## The sixth column is the index into the original matrix 

cap_code = concatenate( (ones((cap.shape[0], 1)), reshape(r_[0:cap.shape[0]], (cap.shape[0], 1))), axis=1) 
cap_info = concatenate((cap, cap_code), axis=1) 

usp_code = concatenate( (2*ones((usp.shape[0], 1)), reshape(r_[0:usp.shape[0]], (usp.shape[0], 1))), axis=1) 
usp_info = concatenate((usp, usp_code), axis=1) 

## Concatenate the matrices together 
cu = concatenate((cap_info, usp_info), axis=0) 
print cu 

## Sort it 
cu.sort(axis=0) 
print cu 

## Do a forward difference from row to row 
cu_diff = diff(cu, n=1, axis=0) 

## Now calculate the sum of the first three columns 
## as it should be zero (or near zero) 
cu_diff_s = sum(abs(cu_diff[:,:3]), axis=1) 

## Find the indices where it is zero 
## Change this to be <= eps if you are using float numbers 
indices = find(cu_diff_s == 0) 
print indices 

## And here are the rows... 
print cu[indices,:] 
print cu[indices+1,:]

它似乎工作基於我的設計數據。它有點令人費解，所以我不認爲我會想進一步追求這個方向。

祝你好運！

來源

2014-06-10 17:54:27 brechmos

我認爲你應該循環，如果它會使代碼更快。通常NumPy讓你不必循環，但並非總是如此。 – nneonneo

@nneonneo。當然是。關鍵是幾乎總是基礎代碼（必須在某個級別循環）比使用Python循環要快。列表解析可能會稍微不同，因爲它們已經過優化。 – brechmos

這很接近，但我需要知道哪些數值來自哪個數組。一旦我這樣做，我就失去了這一點。 – ith140

Matlab的等效返回使用numpy的行索引是以下內容，它返回一個布爾數組，對於同一行的索引爲1：唯一的非重複行的

def find_rows_in_array(arr, rows): 
    ''' 
    find indices of rows in array if they exist 
    ''' 
    tmp = np.prod(np.swapaxes(
     arr[:, :, None], 1, 2) == rows, axis=2) 
    return np.sum(np.cumsum(tmp, axis=0) * tmp == 1, 
        axis=1) > 0

上述返回指數。如果你想返回每一個可能的行，然後：

def find_rows_in_array(arr, rows): 
    ''' 
    find indices of rows in array if they exist 
    ''' 
    tmp = np.prod(np.swapaxes(
     arr[:, :, None], 1, 2) == rows, axis=2) 
    return np.sum(tmp, 
        axis=1) > 0

這是更快。您可以將數組交換爲輸入，以便爲每個數組找到相應的索引。享受：d

來源

2016-11-21 16:31:00

的numpy_indexed包（免責聲明：我是它的作者）包含了所有你需要的功能，以高效的方式實現（即全矢量，蟒水平，因此沒有慢環路）：

import numpy_indexed as npi 
c = npi.intersection(capind, uspind) 
ia = npi.indices(capind, c) 
ib = npi.indices(uspind, c)

取決於你如何看重簡潔VS性能，你可能更喜歡：

import numpy_indexed as npi 
a = npi.as_index(capind) 
b = npi.as_index(uspind) 
c = npi.intersection(a, b) 
ia = npi.indices(a, c) 
ib = npi.indices(b, c)

來源

2016-11-21 18:38:03

2D numpy數組搜索（相當於Matlab的相交'行'選項）

回答

相關問題