使用numpy獲得每行唯一計數〜和〜唯一值

我試圖獲得相當於np.unique，但帶有'axis = 1'選項。使用numpy獲得每行唯一計數〜和〜唯一值

a = np.array([[8, 8, 8, 5, 8], 
     [8, 2, 0, 8, 8], 
     [4, 5, 4, 2, 4], 
     [4, 6, 5, 2, 6]])

我正在尋找每行最高計數值並將其保存爲一維矢量。基本上「每行最常見的是哪個值」。

正確答案：[8,8,4,6]在這個例子中。

現在我做這樣的事情：

y = np.zeros(len(a)) 

for i in xrange(len(a)): 
    [u,cnt] = np.unique(a[i,:],return_counts=True) 
    # pick the value from 'u' that is seen the most. 
    y[i] = u[np.argmax(cnt)]

遍歷數千行的時候這給預期的效果，但在Python很慢。我正在尋找完全矢量化的方法。

我發現unique row elements職位，但它並不完全做我想做的（，要麼我不是很聰明到它Munge時間成所需的形式或直接不適用。）

感謝您在提前爲您提供任何幫助。

來源

2016-06-16 Phil Glau

請注意，唯一不能以您想要的方式進行矢量化：每行可能有不同數量的unqiue元素，因此返回將不得不爲NumPy中的選項。 – Jaime

一種選擇是使用scipy.stats.mode：

In [36]: from scipy.stats import mode 

In [37]: a 
Out[37]: 
array([[8, 8, 8, 5, 8], 
     [8, 2, 0, 8, 8], 
     [4, 5, 4, 2, 4], 
     [4, 6, 5, 2, 6]]) 

In [38]: vals, counts = mode(a, axis=1) 

In [39]: vals 
Out[39]: 
array([[8], 
     [8], 
     [4], 
     [6]]) 

In [40]: counts 
Out[40]: 
array([[4], 
     [3], 
     [3], 
     [2]])

但是，它是使用numpy的，並根據輸入的值分佈用Python編寫的，它可能不會比你的解決方案快。你可以在https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py找到實現（當我寫這個，它在這裏：https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L372）。

該函數的基本部分僅取決於numpy，所以如果它對你有效，但你不想依賴於scipy，你可以將該函數複製到你自己的項目中 - 只要確保遵循scipy使用的BSD許可條款。（：我是它的作者聲明）：

import numpy_indexed as npi 
r = np.indices(a.shape)[0] 
(ua, ur), c = npi.unique((a.flatten(), r.flatten()), return_count=True) 
u, i = npi.group_by(ur).argmax(c) 
y = ua[i]

也就是說，我們首先發現價值的獨特計數「一個」配對與它們的行

來源

2016-06-16 01:54:52

不錯！是的，快得多。這取決於行中可能值的數量。我使用的CIFAR-10僅包含10個可能的值，因此只有10個循環。如果有更多或更多的值存在行，則可能不會擴展。 –

一個完全量化的解決方案可以使用numpy_indexed包來實現索引，然後找到由每個行索引形成的組內的這種對的最大數量。

在'a'中只使用10個可能的值我不確定這比目前接受的答案快，但這種方法的時間複雜度不是'a'中使用的位數的函數，所以它應該更好地擴展到包含更多標籤的數據集。

來源

2016-06-16 05:30:38

使用numpy獲得每行唯一計數〜和〜唯一值

回答

相關問題