2017-08-03 43 views
1

,我有以下的數據幀:如何計算分組值的期望欄的內容設置在數據幀


testdf <- structure(list(gene = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L, 
1L, 1L, 1L), .Label = c("Actc1", "Cbx1"), class = "factor"), 
    p1 = structure(c(5L, 1L, 2L, 3L, 4L, 1L, 1L, 1L, 1L, 1L), .Label = c("BoneMarrow", 
    "Liver", "Pulmonary", "Umbilical", "Vertebral"), class = "factor"), 
    p2 = structure(c(1L, 1L, 1L, 1L, 1L, 5L, 2L, 3L, 4L, 1L), .Label = c("Adipose", 
    "Liver", "Pulmonary", "Umbilical", "Vertebral"), class = "factor")), .Names = c("gene", 
"p1", "p2"), class = "data.frame", row.names = c(NA, -10L)) 

testdf 
#>  gene   p1  p2 
#> 1 Cbx1 Vertebral Adipose 
#> 2 Cbx1 BoneMarrow Adipose 
#> 3 Cbx1  Liver Adipose 
#> 4 Cbx1 Pulmonary Adipose 
#> 5 Cbx1 Umbilical Adipose 
#> 6 Actc1 BoneMarrow Vertebral 
#> 7 Actc1 BoneMarrow  Liver 
#> 8 Actc1 BoneMarrow Pulmonary 
#> 9 Actc1 BoneMarrow Umbilical 
#> 10 Actc1 BoneMarrow Adipose 

我想要做的是gene組和計數頻率p1。造成這樣的:

Cbx1 5 #Vertebral, Bone Marrow, Liver, Pulmonary, Umbilical 
Actc1 1 #Bone Marrow 

我嘗試這樣做,但它並沒有給我想要的東西:

testdf %>% group_by(gene) %>% mutate(n=n()) 

回答

2

您可以使用n_distinct計算唯一值:

testdf %>% group_by(gene) %>% summarise(n = n_distinct(p1)) 

# A tibble: 2 x 2 
# gene  n 
# <fctr> <int> 
#1 Actc1  1 
#2 Cbx1  5 
3

替代使用aggregate

aggregate(p1 ~ gene, testdf, function(x) length(unique(x))) 

# gene p1 
#1 Actc1 1 
#2 Cbx1 5 
1

您也可以使用tapply

with(testdf,tapply(p1,gene,function(x)length(unique(x)))) 
    Actc1 Cbx1 
     1  5