獨特值的指標陣列

我從包含N唯一值（product(a.shape) >= N）的數組a開始。
我需要找到a中a中各個元素位置處的（排序的）唯一值列表中的索引0 .. N-1的數組b。獨特值的指標陣列

作爲示例

import numpy as np 
np.random.seed(42) 
a = np.random.choice([0.1,1.3,7,9.4], size=(4,3)) 
print a

打印a作爲

[[ 7. 9.4 0.1] 
[ 7. 7. 9.4] 
[ 0.1 0.1 7. ] 
[ 1.3 7. 7. ]]

的唯一值是[0.1, 1.3, 7.0, 9.4]，所以所需的結果b將是

[[2 3 0] 
[2 2 3] 
[0 0 2] 
[1 2 2]]

（例如，在a[0,0]值是7.; 7.的索引號爲2;因此b[0,0] == 2。）

由於numpy does not have an index function, 我可以使用循環做到這一點。或者遍歷輸入數組，像這樣：

u = np.unique(a).tolist() 
af = a.flatten() 
b = np.empty(len(af), dtype=int) 
for i in range(len(af)): 
    b[i] = u.index(af[i]) 
b = b.reshape(a.shape) 
print b

或遍歷的唯一值如下：

u = np.unique(a) 
b = np.empty(a.shape, dtype=int) 
for i in range(len(u)): 
    b[np.where(a == u[i])] = i 
print b

我假定遍歷的唯一值的第二個方法是已經比更有效第一種情況下，a中的所有值都不相同;但是它仍然涉及到這個循環，與現場操作相比效率相當低。

所以我的問題是：什麼是最有效的方式獲得數組b填充a的唯一值的indizes？

來源

2017-03-13 ImportanceOfBeingErnest

你可以使用np.unique其可選的參數return_inverse -

np.unique(a, return_inverse=1)[1].reshape(a.shape)

採樣運行 -

In [308]: a 
Out[308]: 
array([[ 7. , 9.4, 0.1], 
     [ 7. , 7. , 9.4], 
     [ 0.1, 0.1, 7. ], 
     [ 1.3, 7. , 7. ]]) 

In [309]: np.unique(a, return_inverse=1)[1].reshape(a.shape) 
Out[309]: 
array([[2, 3, 0], 
     [2, 2, 3], 
     [0, 0, 2], 
     [1, 2, 2]])

通過看起來很有效的給我source code of np.unique去，但還是修剪出來的非必要部分，我們最終會得到另一種解決方案，就像這樣 -

def unique_return_inverse(a): 
    ar = a.flatten()  
    perm = ar.argsort() 
    aux = ar[perm] 
    flag = np.concatenate(([True], aux[1:] != aux[:-1])) 
    iflag = np.cumsum(flag) - 1 
    inv_idx = np.empty(ar.shape, dtype=np.intp) 
    inv_idx[perm] = iflag 
    return inv_idx

個

計時 -

In [444]: a= np.random.randint(0,1000,(1000,400)) 

In [445]: np.allclose(np.unique(a, return_inverse=1)[1],unique_return_inverse(a)) 
Out[445]: True 

In [446]: %timeit np.unique(a, return_inverse=1)[1] 
10 loops, best of 3: 30.4 ms per loop 

In [447]: %timeit unique_return_inverse(a) 
10 loops, best of 3: 29.5 ms per loop

不是一個很大的改進有在內置。

來源

2017-03-13 13:11:58 Divakar

哇，這已經是被接受的非常好的候選人。然而它返回（創建）兩個大小爲'a'的數組，對嗎？所以在內存方面可能會有更高效的解決方案？！ – ImportanceOfBeingErnest

@ImportanceOfBeingErnest增加了另一個非常非常邊緣的改進。 – Divakar

獨特值的指標陣列

回答

相關問題