從R的數據幀的列中獲取`n`的最大或最小值

我有大的數據幀。我想查找某列的最低元素的行索引。對於例如：考慮下面的數據幀df從R的數據幀的列中獲取`n`的最大或最小值

col_1 col_2 col_3 
    1  2  3 
    -1  2  21 
    2  3  1

所以func(dataframe = df, column_name = col_1, n=2)將返回我

[1,2] #index of the rows

注：我想避免排序的列。

來源

2016-09-23 random_28

你的問題是不是超級清楚，但你很可能使用'rank'。 – lmo

是的，我可以使用排名。但排名內部使用排序嗎？ –

不知道你能做到這一點沒有某種某種。 –

使用排序，但這裏有一種方法。

set.seed(1) 
nr = 100 
nc = 10 
n  = 5 
ixCol = 1 
input = matrix(runif(nr*nc),nrow = nr,ncol=nc) 
input[head(order(input[,ixCol]),n),]

來源

2016-09-23 12:08:54

一個有趣的問題。我能想到（至少）四種方法;全部使用基礎R解決方案。爲了簡單起見，我只是創建一個向量，而不是使用數據框。如果它適用於矢量，則只是數據框的子集。

首先是一些虛擬數據

x = runif(1e6)

現在的四種方法（在速度的順序排列）

## Using partial sorting 
f = function(n){ 
    cut_off = sort(x, partial=n+1)[n+1] 
    x[x < cut_off] 
} 

## Using a faster method of sorting; but doesn't work with partial 
g = function(n){ 
    cut_off = sort(x, method="radix")[n+1] 
    x[x < cut_off] 
} 

# Ordering 
h = function(n) x[order(x)[1:n]] 

#Ranking 
i = function(n) x[rank(x) %in% 1:n]

計時指示，小心分揀似乎是最佳的。

R> microbenchmark::microbenchmark(f(n), g(n), h(n),i(n), times = 4) 
Unit: milliseconds 
expr min  lq mean median  uq max neval cld 
f(n) 112.8 116.0 122.1 122.6 128.1 130.2  4 a 
g(n) 372.6 379.1 442.6 386.1 506.1 625.6  4 b 
h(n) 1162.3 1196.0 1222.0 1238.4 1248.0 1248.8  4 c 
i(n) 1414.9 1437.9 1489.1 1484.4 1540.3 1572.6  4 d

要使用的數據幀的工作，你會碰到這樣的：

cut_off = sort(df$col, partial=n+1)[n+1] 
df[df$col < cut_off,]

來源

2016-09-23 12:13:17 csgillespie

使用dplyr和（更容易代碼）magrittr：

data(iris) # use iris dataset 

library(dplyr); library(magrittr) # load packages 

iris %>% 
    filter(Sepal.Length %in% sort(Sepal.Length)[1:3])

此輸出與行沒有排序數據幀的最低3 Sepal.Length值。在這種情況下，有關係，所以它輸出四行。

得到相應的列名，你可以使用這樣的事情：

rownames(subset(iris, 
      Sepal.Length %in% sort(Sepal.Length)[1:3]))

來源

2016-09-23 12:17:03

從R的數據幀的列中獲取`n`的最大或最小值

回答

相關問題