如何計算分組值的期望欄的內容設置在數據幀

，我有以下的數據幀：如何計算分組值的期望欄的內容設置在數據幀

testdf <- structure(list(gene = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L, 
1L, 1L, 1L), .Label = c("Actc1", "Cbx1"), class = "factor"), 
    p1 = structure(c(5L, 1L, 2L, 3L, 4L, 1L, 1L, 1L, 1L, 1L), .Label = c("BoneMarrow", 
    "Liver", "Pulmonary", "Umbilical", "Vertebral"), class = "factor"), 
    p2 = structure(c(1L, 1L, 1L, 1L, 1L, 5L, 2L, 3L, 4L, 1L), .Label = c("Adipose", 
    "Liver", "Pulmonary", "Umbilical", "Vertebral"), class = "factor")), .Names = c("gene", 
"p1", "p2"), class = "data.frame", row.names = c(NA, -10L)) 

testdf 
#>  gene   p1  p2 
#> 1 Cbx1 Vertebral Adipose 
#> 2 Cbx1 BoneMarrow Adipose 
#> 3 Cbx1  Liver Adipose 
#> 4 Cbx1 Pulmonary Adipose 
#> 5 Cbx1 Umbilical Adipose 
#> 6 Actc1 BoneMarrow Vertebral 
#> 7 Actc1 BoneMarrow  Liver 
#> 8 Actc1 BoneMarrow Pulmonary 
#> 9 Actc1 BoneMarrow Umbilical 
#> 10 Actc1 BoneMarrow Adipose

我想要做的是gene組和計數頻率p1。造成這樣的：

Cbx1 5 #Vertebral, Bone Marrow, Liver, Pulmonary, Umbilical 
Actc1 1 #Bone Marrow

我嘗試這樣做，但它並沒有給我想要的東西：

testdf %>% group_by(gene) %>% mutate(n=n())

來源

2017-08-03 scamander

您可以使用n_distinct計算唯一值：

testdf %>% group_by(gene) %>% summarise(n = n_distinct(p1)) 

# A tibble: 2 x 2 
# gene  n 
# <fctr> <int> 
#1 Actc1  1 
#2 Cbx1  5

來源

2017-08-03 03:51:20 Psidom

替代使用aggregate

aggregate(p1 ~ gene, testdf, function(x) length(unique(x))) 

# gene p1 
#1 Actc1 1 
#2 Cbx1 5

來源

2017-08-03 04:01:44

您也可以使用tapply

with(testdf,tapply(p1,gene,function(x)length(unique(x)))) 
    Actc1 Cbx1 
     1  5

來源

2017-08-03 05:03:18 Onyambu

如何計算分組值的期望欄的內容設置在數據幀

回答

相關問題