篩選基於最大值

我有一個numpy的陣列，其保持具有以下格式（X，Y，Z，W）篩選基於最大值

陣列的大小是4×4 N.維向量一個numpy的陣列現在，我擁有的數據就是我擁有（x，y，z）空間位置的位置，w在這個位置保存了一些特定的測量值。現在，可以有多個與（x，y，z）位置相關的測量值（以浮點形式測量）。

我想要做的就是過濾數組，以便得到一個新的數組，我可以獲得與每個（x，y，z）位置相對應的最大測量值。

所以，如果我的數據是這樣的：

x, y, z, w1 
x, y, z, w2 
x, y, z, w3

其中W1比W2和W3更大，過濾後的數據將是：

x, y, z, w1

所以更具體，說我有這樣的數據：

[[ 0.7732126 0.48649481 0.29771819 0.91622924] 
[ 0.7732126 0.48649481 0.29771819 1.91622924] 
[ 0.58294263 0.32025559 0.6925856 0.0524125 ] 
[ 0.58294263 0.32025559 0.6925856 0.05 ] 
[ 0.58294263 0.32025559 0.6925856 1.7 ] 
[ 0.3239913 0.7786444 0.41692853 0.10467392] 
[ 0.12080023 0.74853649 0.15356663 0.4505753 ] 
[ 0.13536096 0.60319054 0.82018125 0.10445047] 
[ 0.1877724 0.96060999 0.39697999 0.59078612]]

這應返回

[[ 0.7732126 0.48649481 0.29771819 1.91622924] 
[ 0.58294263 0.32025559 0.6925856 1.7 ] 
[ 0.3239913 0.7786444 0.41692853 0.10467392] 
[ 0.12080023 0.74853649 0.15356663 0.4505753 ] 
[ 0.13536096 0.60319054 0.82018125 0.10445047] 
[ 0.1877724 0.96060999 0.39697999 0.59078612]]

來源

2015-08-17 Luca

會爲相同的（X，Y，Z）位置的條目始終是連續的，因爲在你的樣本數據，還是會分散？你會有多少實際參賽作品？ – jme

他們可能分散不幸。它們永遠不會超過4.性能對於這個幸運並不重要。 – Luca

供參考：這是一個被稱爲「分組」的操作（參見http://pandas.pydata.org/pandas-docs/stable/groupby.html）。您將按前三列進行分組，然後將最大功能應用於組。這對像熊貓這樣的圖書館來說很容易（http://pandas.pydata.org/）。 –

這是令人費解的，但它可能是因爲你要開始使用numpy的唯一好...

首先，我們使用lexsort將所有條目放在一起。隨着a是你的樣品陣列：

>>> perm = np.lexsort(a[:, 3::-1].T) 
>>> a[perm] 
array([[ 0.12080023, 0.74853649, 0.15356663, 0.4505753 ], 
     [ 0.7732126 , 0.48649481, 0.29771819, 0.91622924], 
     [ 0.7732126 , 0.48649481, 0.29771819, 1.91622924], 
     [ 0.1877724 , 0.96060999, 0.39697999, 0.59078612], 
     [ 0.3239913 , 0.7786444 , 0.41692853, 0.10467392], 
     [ 0.58294263, 0.32025559, 0.6925856 , 0.0524125 ], 
     [ 0.58294263, 0.32025559, 0.6925856 , 0.05  ], 
     [ 0.58294263, 0.32025559, 0.6925856 , 1.7  ], 
     [ 0.13536096, 0.60319054, 0.82018125, 0.10445047]])

注意，通過反轉軸，我們被x排序，打破領帶與y，然後z，然後w。

因爲這是我們正在尋找的最大值，我們只需要抓住每一個組中的最後一個條目，這是一個非常簡單的事情：

>>> a_sorted = a[perm] 
>>> last = np.concatenate((np.all(a_sorted[:-1, :3] != a_sorted[1:, :3], axis=1), 
          [True])) 
>>> a_unique_max = a_sorted[last] 
>>> a_unique_max 
array([[ 0.12080023, 0.74853649, 0.15356663, 0.4505753 ], 
     [ 0.13536096, 0.60319054, 0.82018125, 0.10445047], 
     [ 0.1877724 , 0.96060999, 0.39697999, 0.59078612], 
     [ 0.3239913 , 0.7786444 , 0.41692853, 0.10467392], 
     [ 0.58294263, 0.32025559, 0.6925856 , 1.7  ], 
     [ 0.7732126 , 0.48649481, 0.29771819, 1.91622924]])

如果你寧可不要輸出排序，但讓他們在他們的原始數組中想出了原來的順序，還可以得到，隨着perm的幫助：

>>> a_unique_max[np.argsort(perm[last])] 
array([[ 0.7732126 , 0.48649481, 0.29771819, 1.91622924], 
     [ 0.58294263, 0.32025559, 0.6925856 , 1.7  ], 
     [ 0.3239913 , 0.7786444 , 0.41692853, 0.10467392], 
     [ 0.12080023, 0.74853649, 0.15356663, 0.4505753 ], 
     [ 0.13536096, 0.60319054, 0.82018125, 0.10445047], 
     [ 0.1877724 , 0.96060999, 0.39697999, 0.59078612]])

這將只爲最大的工作，它是作爲一個副產品分揀產品。如果你是一個不同的功能後，說所有的同一個座標條目的產品，你可以這樣做：

>>> first = np.concatenate(([True], 
          np.all(a_sorted[:-1, :3] != a_sorted[1:, :3], axis=1))) 
>>> a_unique_prods = np.multiply.reduceat(a_sorted, np.nonzero(first)[0])

，你將不得不圍繞發揮一點與這些結果來組裝你的回報陣列。

來源

2015-08-17 17:18:46 Jaime

-1

您可以使用邏輯索引。

我會用隨機的數據爲例：

>>> myarr = np.random.random((6, 4)) 
>>> print(myarr) 
[[ 0.7732126 0.48649481 0.29771819 0.91622924] 
[ 0.58294263 0.32025559 0.6925856 0.0524125 ] 
[ 0.3239913 0.7786444 0.41692853 0.10467392] 
[ 0.12080023 0.74853649 0.15356663 0.4505753 ] 
[ 0.13536096 0.60319054 0.82018125 0.10445047] 
[ 0.1877724 0.96060999 0.39697999 0.59078612]]

要獲取的行或列在最後一列是最大的，這樣做：

>>> greatest = myarr[myarr[:, 3]==myarr[:, 3].max()] 
>>> print(greatest) 
[[ 0.7732126 0.48649481 0.29771819 0.91622924]]

這樣做是它得到最後一列myarr，並找到該列的最大值，查找該列的所有元素等於最大值，然後獲取相應的行。

來源

2015-08-17 14:30:49 TheBlackCat

這不是我尋求的行爲。我已經對這個問題進行了編輯，希望能夠更清楚地說明問題。 – Luca

-1

您可以使用np.argmax

x[np.argmax(x[:,3]),:]

>>> x = np.random.random((5,4)) 
>>> x 
array([[ 0.25461146, 0.35671081, 0.54856798, 0.2027313 ], 
     [ 0.17079029, 0.66970362, 0.06533572, 0.31704254], 
     [ 0.4577928 , 0.69022073, 0.57128696, 0.93995176], 
     [ 0.29708841, 0.96324181, 0.78859008, 0.25433235], 
     [ 0.58739451, 0.17961551, 0.67993786, 0.73725493]]) 
>>> x[np.argmax(x[:,3]),:] 
array([ 0.4577928 , 0.69022073, 0.57128696, 0.93995176])

來源

2015-08-17 14:31:25 asiviero

這不是我尋求的行爲。我已經對這個問題進行了編輯，希望能夠更清楚地說明問題。 – Luca

我看到你已經在評論中得到了熊貓的指針。 FWIW，假設你不關心最後的排序順序，因爲groupby會改變它，所以你可以如何獲得所需的行爲。

In [14]: arr 
Out[14]: 
array([[ 0.7732126 , 0.48649481, 0.29771819, 0.91622924], 
     [ 0.7732126 , 0.48649481, 0.29771819, 1.91622924], 
     [ 0.58294263, 0.32025559, 0.6925856 , 0.0524125 ], 
     [ 0.58294263, 0.32025559, 0.6925856 , 0.05  ], 
     [ 0.58294263, 0.32025559, 0.6925856 , 1.7  ], 
     [ 0.3239913 , 0.7786444 , 0.41692853, 0.10467392], 
     [ 0.12080023, 0.74853649, 0.15356663, 0.4505753 ], 
     [ 0.13536096, 0.60319054, 0.82018125, 0.10445047], 
     [ 0.1877724 , 0.96060999, 0.39697999, 0.59078612]]) 

In [15]: import pandas as pd 

In [16]: pd.DataFrame(arr) 
Out[16]: 
      0   1   2   3 
0 0.773213 0.486495 0.297718 0.916229 
1 0.773213 0.486495 0.297718 1.916229 
2 0.582943 0.320256 0.692586 0.052413 
3 0.582943 0.320256 0.692586 0.050000 
4 0.582943 0.320256 0.692586 1.700000 
5 0.323991 0.778644 0.416929 0.104674 
6 0.120800 0.748536 0.153567 0.450575 
7 0.135361 0.603191 0.820181 0.104450 
8 0.187772 0.960610 0.396980 0.590786 

In [17]: pd.DataFrame(arr).groupby([0,1,2]).max().reset_index() 
Out[17]: 
      0   1   2   3 
0 0.120800 0.748536 0.153567 0.450575 
1 0.135361 0.603191 0.820181 0.104450 
2 0.187772 0.960610 0.396980 0.590786 
3 0.323991 0.778644 0.416929 0.104674 
4 0.582943 0.320256 0.692586 1.700000 
5 0.773213 0.486495 0.297718 1.916229

來源

2015-08-18 03:30:39

謝謝。非常好的解決方案。我將詳細探討這一點。 – Luca

您可以從lex-sorting輸入數組開始，先後輸入相同的前三個元素。然後，創建另一個2D數組來存儲最後一個列條目，以便與每個重複三元組對應的元素進入相同的行。接下來，找到沿axis=1這個二維數組，因此有最終的max輸出每個這樣的獨特三重。這裏的執行，假設A作爲輸入陣列 -

# Lex sort A 
sortedA = A[np.lexsort(A[:,:-1].T)] 

# Mask of start of unique first three columns from A 
start_unqA = np.append(True,~np.all(np.diff(sortedA[:,:-1],axis=0)==0,axis=1)) 

# Counts of unique first three columns from A 
counts = np.bincount(start_unqA.cumsum()-1) 
mask = np.arange(counts.max()) < counts[:,None] 

# Group A's last column into rows based on uniqueness from first three columns 
grpA = np.empty(mask.shape) 
grpA.fill(np.nan) 
grpA[mask] = sortedA[:,-1] 

# Concatenate unique first three columns from A and 
# corresponding max values for each such unique triplet 
out = np.column_stack((sortedA[start_unqA,:-1],np.nanmax(grpA,axis=1)))

採樣運行 -

In [75]: A 
Out[75]: 
array([[ 1, 1, 1, 96], 
     [ 1, 2, 2, 48], 
     [ 2, 1, 2, 33], 
     [ 1, 1, 1, 24], 
     [ 1, 1, 1, 94], 
     [ 2, 2, 2, 5], 
     [ 2, 1, 1, 17], 
     [ 2, 2, 2, 62]]) 

In [76]: sortedA 
Out[76]: 
array([[ 1, 1, 1, 96], 
     [ 1, 1, 1, 24], 
     [ 1, 1, 1, 94], 
     [ 2, 1, 1, 17], 
     [ 2, 1, 2, 33], 
     [ 1, 2, 2, 48], 
     [ 2, 2, 2, 5], 
     [ 2, 2, 2, 62]]) 

In [77]: out 
Out[77]: 
array([[ 1., 1., 1., 96.], 
     [ 2., 1., 1., 17.], 
     [ 2., 1., 2., 33.], 
     [ 1., 2., 2., 48.], 
     [ 2., 2., 2., 62.]])

來源

2015-08-18 06:44:25 Divakar

篩選基於最大值

回答

相關問題