2016-05-22 67 views
2

我有這個數據幀調用mydfSample列中有重複的樣本。我想提取具有最大值total_reads的唯一樣品行並獲取result在R數據幀的另一列中提取具有最大值的唯一行

mydf<-structure(list(Sample = c("AOGC-02-0188", "AOGC-02-0191", "AOGC-02-0191", 
"AOGC-02-0191", "AOGC-02-0194", "AOGC-02-0194", "AOGC-02-0194" 
), total_reads = c(27392583, 19206920, 34462563, 53669483, 24731988, 
43419826, 68151814), Lane = c("4", "5", "4", "4;5", "5", "4", 
"4;5")), .Names = c("Sample", "total_reads", "Lane"), row.names = c("166", 
"169", "170", "171", "173", "174", "175"), class = "data.frame") 

結果

Sample  total_reads Lane 
AOGC-02-0188 27392583 4 
AOGC-02-0191 53669483 4;5 
AOGC-02-0194 68151814 4;5 
+1

的可能的複製[集合在給定的列中的數據幀,並顯示另一列(http://stackoverflow.com/questions/6289538/aggregate-a-dataframe-on-a-given-column-and -display-另一個列) – Bulat

回答

4

您可以aggregate然後merge

merge(aggregate(total_reads ~ Sample, mydf, max), mydf) 
#  Sample total_reads Lane 
#1 AOGC-02-0188 27392583 4 
#2 AOGC-02-0191 53669483 4;5 
#3 AOGC-02-0194 68151814 4;5 
1

使用dplyr包,你能做到這一點是這樣的:

mydf %>% 
    group_by(Sample) %>% # for each unique sample 
    arrange(-total_reads) %>% # order by total_reads DESC 
    slice(1) # select the first row, i.e. with highest total_reads 
2

我們可以使用data.table。將'data.frame'轉換爲'data.table'(setDT(mydf)),按「樣本」分組,order將'total_reads'取消並將第一次觀察與head進行比較。

library(data.table) 
setDT(mydf)[order(-total_reads), head(.SD, 1) , Sample] 
相關問題