返回熊貓數據框中的n個最大/最小值，其中許多行包含相同的值

我在想如何在數據框df中返回包含n個最小值的行，看起來像這樣。返回熊貓數據框中的n個最大/最小值，其中許多行包含相同的值

id   xx    count 
1   A    1 
2   B    1 
3   C    3 
4   D    2 
5   E    3 
6   F    10 
7   G    11 
8   H    17

說我想找到包含3張最小數行（在這種情況下，3張最小數是1,2和3）。所以，我想答案是這樣的：

id   xx    count 
    1   A    1 
    2   B    1 
    4   D    2 
    3   C    3 
    5   E    3

如果我只是排序基於計數數據框，並使用df.nsmallest(3, 'count')，它只會返回所需的數據框的前三排。但我想要包含3個最小計數的所有行。有沒有更簡單的方法在熊貓中做到這一點？提前致謝！

來源

2017-02-28 Gingerbread

可以先drop_duplicates與nsmallest查找值，然後boolean indexing與isin：

s = df['count'].drop_duplicates().nsmallest(3) 
print (s) 
0 1 
3 2 
2 3 
Name: count, dtype: int64 

print (df[df['count'].isin(s)]) 
    id xx count 
0 1 A  1 
1 2 B  1 
2 3 C  3 
3 4 D  2 
4 5 E  3

與unique另一種解決方案，通過numpy.sort排序（因爲在numpy arrayunique輸出），並選擇第一3個值：

arr = np.sort(df['count'].unique())[:3] 
print (arr) 
[1 2 3] 

print (df[df['count'].isin(arr)]) 
    id xx count 
0 1 A  1 
1 2 B  1 
2 3 C  3 
3 4 D  2 
4 5 E  3

來源

2017-02-28 19:44:35 jezrael

你真是太神奇了@jezrael！ – Gingerbread

另一種解決方案 - 使用rank()方法：

In [43]: df[df['count'].rank(method='dense') <= 3] 
Out[43]: 
    id xx count 
0 1 A  1 
1 2 B  1 
2 3 C  3 
3 4 D  2 
4 5 E  3

來源

2017-02-28 20:23:22 MaxU

返回熊貓數據框中的n個最大/最小值，其中許多行包含相同的值

回答

相關問題