比較元件3行的時間

我得到如下一numpy的數組：比較元件3行的時間

[[3.4, 87] 
[5.5, 11] 
[22, 3] 
[4, 9.8] 
[41, 11.22] 
[32, 7.6]]

，我想：

比較元素在第2欄，3行的時間
刪除與最大的值的行中欄2,3行的時間

例如，在前3行中，第2列中的3個值分別爲87，11和3，並且我希望保留11和3。

輸出numpy的陣列我預計會是：

[[5.5, 11] 
[22, 3] 
[4, 9.8] 
[32, 7.6]]

我是新來numpy的數組，請給我意見，以實現這一目標。

來源

2016-10-29 Heinz

import numpy as np 
x = np.array([[3.4, 87], 
       [5.5, 11], 
       [22, 3], 
       [4, 9.8], 
       [41, 11.22], 
       [32, 7.6]]) 

y = x.reshape(-1,3,2) 
idx = y[..., 1].argmax(axis=1) 
mask = np.arange(3)[None, :] != idx[:, None] 
y = y[mask] 
print(y) 
# This might be helpful for the deleted part of your question 
# y = y.reshape(-1,2,2) 
# z = y[...,1]/y[...,1].sum(axis=1) 
# result = np.dstack([y, z[...,None]])

產生

[[ 5.5 11. ] 
[ 22. 3. ] 
[ 4. 9.8] 
[ 32. 7.6]]

「分組通過3個」與NumPy的可以通過重塑陣列來創建長度爲3的新的軸來完成 - 提供行的原始數量爲整除通過3：

In [92]: y = x.reshape(-1,3,2); y 
Out[92]: 
array([[[ 3.4 , 87. ], 
     [ 5.5 , 11. ], 
     [ 22. , 3. ]], 

     [[ 4. , 9.8 ], 
     [ 41. , 11.22], 
     [ 32. , 7.6 ]]]) 

In [93]: y.shape 
Out[93]: (2, 3, 2) 
      | | | 
      | | o--- 2 columns in each group 
      | o------ 3 rows in each group 
      o--------- 2 groups

對於每個組，我們可以選擇第二列，並找到最大值的行：

In [94]: idx = y[..., 1].argmax(axis=1); idx 
Out[94]: array([0, 1])

array([0, 1])表明在第一組中，第0行索引包含最大（即87），而在第二組中，第一個索引行包含最大值（即11.22）。

接下來，我們可以產生2D布爾選擇掩模，其爲True其中行不包含最大值：

In [95]: mask = np.arange(3)[None, :] != idx[:, None]; mask 
Out[95]: 
array([[False, True, True], 
     [ True, False, True]], dtype=bool) 

In [96]: mask.shape 
Out[96]: (2, 3)

mask具有形狀（2,3）。 y有形狀（2,3,2）。如果mask is used to index y在y[mask]，然後將面膜用的y前兩個軸線對齊，並且所有值其中返回maskTrue：

In [98]: y[mask] 
Out[98]: 
array([[ 5.5, 11. ], 
     [ 22. , 3. ], 
     [ 4. , 9.8], 
     [ 32. , 7.6]]) 

In [99]: y[mask].shape 
Out[99]: (4, 2)

順便說一句，同樣的計算可以用做Pandas這樣的：

import numpy as np 
import pandas as pd 
x = np.array([[3.4, 87], 
       [5.5, 11], 
       [22, 3], 
       [4, 9.8], 
       [41, 11.22], 
       [32, 7.6]]) 

df = pd.DataFrame(x) 
idx = df.groupby(df.index // 3)[1].idxmax() 
# drop the row with the maximum value in each group 
df = df.drop(idx.values, axis=0)

其產生數據幀：

您可能會發現熊貓的語法更易於使用，但對於上述計算，NumPy 更快。

來源

2016-10-29 11:26:15 unutbu

謝謝你的有效答案和詳細描述，我想我需要時間來充分理解它們。 – Heinz

比較元件3行的時間

回答

相關問題