2017-04-18 142 views
-1

嗨,我想彙總幾欄。按字符聚合字段

d <- structure(list(Gene = structure(1:3, .Label = c("k141_20041_1", 
"k141_27047_2", "k141_70_3"), class = "factor"), phylum = structure(c(1L, 
1L, 1L), .Label = "Firmicutes", class = "factor"), class = structure(c(1L, 
1L, 1L), .Label = "Bacillales", class = "factor"), order = structure(c(1L, 
1L, 1L), .Label = "Bacilli", class = "factor"), family = structure(c(1L, 
1L, 1L), .Label = "Bacillaceae", class = "factor"), genus = structure(c(1L, 
1L, 1L), .Label = "Bacillus", class = "factor"), species = structure(c(1L, 
1L, 2L), .Label = c("Bacillus subtilis", "unknown"), class = "factor"), 
    SampleA = c(0, 0, 0), SampleB = c(0, 0, 0), SampleCtrl = c(3.98888888888889, 
    11.5555555555556, 3.35978835978836)), .Names = c("Gene", 
"phylum", "class", "order", "family", "genus", "species", "SampleA", 
"SampleB", "SampleCtrl"), row.names = c(21918L, 40410L, 40857L 
), class = "data.frame") 

這在輸入數據幀聚合

Gene  phylum  class order  family genus   species SampleA SampleB 
k141_20041_1 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis  0  0 
k141_27047_2 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis  0  0 
k141_70_3 Firmicutes Bacillales Bacilli Bacillaceae Bacillus   unknown  0  0 
    SampleCtrl 
    3.99 
11.56 
    3.36 

什麼,我想在結束與所有列的一個單行。在這種情況下,它看起來像這樣(我們可以刪除基因列)。

phylum class order family genus species SampleA SampleB SampleCtrl 
    Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis  0  0  15.6 
    Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus Unknown  0  0  3.36 

請注意,這是一個非常簡單的例子。我在原始數據框中有20個樣本和500多個物種。

回答

0

這裏的一個dplyr溶液:

library(dplyr) 
d%>% 
group_by(phylum,class,order,family,genus, species)%>% 
summarise_if(is.numeric, sum)  
Groups: phylum, class, order, family, genus [?] 

     phylum  class order  family genus   species SampleA SampleB SampleCtrl 
     <fctr>  <fctr> <fctr>  <fctr> <fctr>   <fctr> <dbl> <dbl>  <dbl> 
1 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis  0  0 15.54444 
2 Firmicutes Bacillales Bacilli Bacillaceae Bacillus   unknown  0  0 3.35979 
+0

我想彙總所有樣本列(不只是samplectrl列)。我的問題並不清楚。這個例子只會聚合Samplectrl列。我有20多欄,我是否必須在這種情況下列出所有欄? – david

+0

@david我編輯了我的答案。現在,SampleA和SampleB也被彙總,因爲它們也是數字。 –

+0

完美的作品。非常感謝 – david

0

假設樣品列是數字和其他人不和期望的聚合是由其他列(除了基因)來總結每個樣品列分組:

j <- which(sapply(d, is.numeric)) 
aggregate(d[j], d[-c(1, j)], sum) 

,並提供:

 phylum  class order  family genus   species SampleA 
1 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis  0 
2 Firmicutes Bacillales Bacilli Bacillaceae Bacillus   unknown  0 
    SampleB SampleCtrl 
1  0 15.544444 
2  0 3.359788 

肛療法可能性,如果樣品列都在他們的名字Sample,在其他列別是用的,而不是上面的第一行是:

j <- grep("Sample", names(d)) 

,或者如果沒有上述假設的持有那麼如果我們知道,樣品列最後3列,則:

j <- seq(to = ncol(d), length = 3) 

更新:固定,增加了兩個備選方案。

+0

謝謝它也行得通!所有樣品列確實是數字。他們有不同的名字,但感謝你的答案 – david