在R數據幀的另一列中提取具有最大值的唯一行

我有這個數據幀調用mydf。 Sample列中有重複的樣本。我想提取具有最大值total_reads的唯一樣品行並獲取result。在R數據幀的另一列中提取具有最大值的唯一行

mydf<-structure(list(Sample = c("AOGC-02-0188", "AOGC-02-0191", "AOGC-02-0191", 
"AOGC-02-0191", "AOGC-02-0194", "AOGC-02-0194", "AOGC-02-0194" 
), total_reads = c(27392583, 19206920, 34462563, 53669483, 24731988, 
43419826, 68151814), Lane = c("4", "5", "4", "4;5", "5", "4", 
"4;5")), .Names = c("Sample", "total_reads", "Lane"), row.names = c("166", 
"169", "170", "171", "173", "174", "175"), class = "data.frame")

結果

Sample  total_reads Lane 
AOGC-02-0188 27392583 4 
AOGC-02-0191 53669483 4;5 
AOGC-02-0194 68151814 4;5

來源

2016-05-22 MAPK

的可能的複製[集合在給定的列中的數據幀，並顯示另一列（http://stackoverflow.com/questions/6289538/aggregate-a-dataframe-on-a-given-column-and -display-另一個列） – Bulat

您可以aggregate然後merge，

merge(aggregate(total_reads ~ Sample, mydf, max), mydf) 
#  Sample total_reads Lane 
#1 AOGC-02-0188 27392583 4 
#2 AOGC-02-0191 53669483 4;5 
#3 AOGC-02-0194 68151814 4;5

來源

2016-05-22 08:02:12 Sotos

使用dplyr包，你能做到這一點是這樣的：

mydf %>% 
    group_by(Sample) %>% # for each unique sample 
    arrange(-total_reads) %>% # order by total_reads DESC 
    slice(1) # select the first row, i.e. with highest total_reads

來源

2016-05-22 08:01:55 Jasper

我們可以使用data.table。將'data.frame'轉換爲'data.table'（setDT(mydf)），按「樣本」分組，order將'total_reads'取消並將第一次觀察與head進行比較。

library(data.table) 
setDT(mydf)[order(-total_reads), head(.SD, 1) , Sample]

來源

2016-05-22 10:37:55 akrun

在R數據幀的另一列中提取具有最大值的唯一行

回答

相關問題