基於來自另一個陣列的數據對numpy數組排序

我有兩組數組data和result。 result在data中包含相同的元素，但具有額外的列和未排序的順序。我想重新排列result數組，使其與data中的行的順序相同，同時在執行排序時將相關值與行的其餘部分一起放入最後一列。基於來自另一個陣列的數據對numpy數組排序

data = np.array([[0,1,0,0],[1,0,0,0],[0,1,1,0],[0,1,0,1]]) 
result = np.array([[0,1,1,0,1],[1,0,0,0,0],[0,1,0,0,1],[0,1,0,1,0]]) 

# this is what the final sorted array should look like: 
''' 
array([[0, 1, 0, 0, 1], 
     [1, 0, 0, 0, 0], 
     [0, 1, 1, 0, 1], 
     [0, 1, 0, 1, 0]]) 
'''

我試着做argsort爲了扭轉data然後將排定的順序將其應用於result但argsort似乎排序基於每個元素的排列順序，而我想要的那種對待每作爲整體的data[:,4]的排。

ind = np.argsort(data) 
indind =np.argsort(ind) 
ind 
array([[0, 2, 3, 1], 
    [1, 2, 3, 0], 
    [0, 3, 1, 2], 
    [0, 2, 1, 3]])

什麼是按行進行這種排序的好方法？

來源

2016-04-10 ROBOTPWNS

是額外的列總是放在序列的最後？ – Deusdeorum

只是想澄清你在做什麼。隨着索引列表[2,1,0,3]我可以這樣重新排序的result行：

In [37]: result[[2,1,0,3],:] 
Out[37]: 
array([[0, 1, 0, 0, 1], 
     [1, 0, 0, 0, 0], 
     [0, 1, 1, 0, 1], 
     [0, 1, 0, 1, 0]]) 

In [38]: result[[2,1,0,3],:4]==data 
Out[38]: 
array([[ True, True, True, True], 
     [ True, True, True, True], 
     [ True, True, True, True], 
     [ True, True, True, True]], dtype=bool)

我沒有看到argsort或sort是怎麼回事，以幫助想出這個索引順序。

隨着np.lexsort我可以訂購兩個陣列相同的行：

In [54]: data[np.lexsort(data.T),:] 
Out[54]: 
array([[1, 0, 0, 0], 
     [0, 1, 0, 0], 
     [0, 1, 1, 0], 
     [0, 1, 0, 1]]) 

In [55]: result[np.lexsort(result[:,:-1].T),:] 
Out[55]: 
array([[1, 0, 0, 0, 0], 
     [0, 1, 0, 0, 1], 
     [0, 1, 1, 0, 1], 
     [0, 1, 0, 1, 0]])

我發現通過試驗和錯誤，我需要使用轉置。我們需要檢查lexsort的文檔以瞭解原因。

多一點的試驗和錯誤產生：

In [66]: i=np.lexsort(data.T) 
In [67]: j=np.lexsort(result[:,:-1].T) 
In [68]: j[i] 
Out[68]: array([2, 1, 0, 3], dtype=int64) 

In [69]: result[j[i],:] 
Out[69]: 
array([[0, 1, 0, 0, 1], 
     [1, 0, 0, 0, 0], 
     [0, 1, 1, 0, 1], 
     [0, 1, 0, 1, 0]])

這是一個暫定的溶液。它需要在其他樣品上進行測試。並需要解釋。

來源

2016-04-10 21:47:10 hpaulj

方法＃1

這裏的考慮每一行作爲索引的元組，然後一種方法找到相應於那些線性索引當量data和result之間的匹配的索引。這些指標將代表行的新順序，當索引到結果中時，它們會給我們所需的輸出。實施應該是這樣的 -

# Slice out from result everything except the last column  
r = result[:,:-1]  

# Get linear indices equivalent of each row from r and data 
ID1 = np.ravel_multi_index(r.T,r.max(0)+1) 
ID2 = np.ravel_multi_index(data.T,r.max(0)+1) 

# Search for ID2 in ID1 and use those indices index into result 
out = result[np.where(ID1[:,None] == ID2)[1]]

方法2

如果所有從data行保證是在result，您可以使用基於剛剛argsort另一種方法，就像這樣 -

# Slice out from result everything except the last column  
r = result[:,:-1]  

# Get linear indices equivalent of each row from r and data 
ID1 = np.ravel_multi_index(r.T,r.max(0)+1) 
ID2 = np.ravel_multi_index(data.T,r.max(0)+1) 

sortidx_ID1 = ID1.argsort() 
sortidx_ID2 = ID2.argsort() 
out = result[sortidx_ID1[sortidx_ID2]]

採樣運行了一點更通用的情況下 -

In [37]: data 
Out[37]: 
array([[ 3, 2, 1, 5], 
     [ 4, 9, 2, 4], 
     [ 7, 3, 9, 11], 
     [ 5, 9, 4, 4]]) 

In [38]: result 
Out[38]: 
array([[ 7, 3, 9, 11, 55], 
     [ 4, 9, 2, 4, 8], 
     [ 3, 2, 1, 5, 7], 
     [ 5, 9, 4, 4, 88]]) 

In [39]: r = result[:,:-1] 
    ...: ID1 = np.ravel_multi_index(r.T,r.max(0)+1) 
    ...: ID2 = np.ravel_multi_index(data.T,r.max(0)+1) 
    ...: 

In [40]: result[np.where(ID1[:,None] == ID2)[1]] # Approach 1 
Out[40]: 
array([[ 3, 2, 1, 5, 7], 
     [ 4, 9, 2, 4, 8], 
     [ 7, 3, 9, 11, 55], 
     [ 5, 9, 4, 4, 88]]) 

In [41]: sortidx_ID1 = ID1.argsort() # Approach 2 
    ...: sortidx_ID2 = ID2.argsort() 
    ...: 

In [42]: result[sortidx_ID1[sortidx_ID2]] 
Out[42]: 
array([[ 3, 2, 1, 5, 7], 
     [ 4, 9, 2, 4, 8], 
     [ 7, 3, 9, 11, 55], 
     [ 5, 9, 4, 4, 88]])

來源

2016-04-10 22:04:01 Divakar

這個答案適用於像上面給出的例子這樣的小數據集，但是當我使用更大的示例（5172x32數據集）時，它給了我錯誤「ValueError：傳遞給ravel_multi_index的維數太多」。我應該如何解決這個問題？ – ROBOTPWNS

@ROBOTPWNS計算這些ID1和ID2，就像這樣，看看它是否有效：'ID1 = r.dot（r.max（0）+1）; ID2 = data.dot（r.max（0）+1）'？ – Divakar

@ROBOTPWNS那麼，那是否適合你？ – Divakar

的numpy_indexed包（聲明：我其作者）可以用來有效地和完美地解決這些類型的問題：

import numpy_indexed as npi 
result[npi.indices(result[:, :-1], data)]

npi.indices基本上list.index的量化等同物;所以對於數據中的每個元素（行），我們得到同一行在結果中的位置，減去最後一列。

請注意，此解決方案適用於任意數量的列，並且是完全向量化的（即，任何地方都沒有python循環）。

來源

2016-04-11 06:17:12

基於來自另一個陣列的數據對numpy數組排序

回答

相關問題