other=data.frame(name=c("a","b","a","c","d"),result=c("Y","N","Y","Y","N"))
如何使用擴展函數在tidyr或其他功能的數據幀得到的結果是或否的數作爲列標題這樣
name Y N
a 2 0
b 0 1
感謝
other=data.frame(name=c("a","b","a","c","d"),result=c("Y","N","Y","Y","N"))
如何使用擴展函數在tidyr或其他功能的數據幀得到的結果是或否的數作爲列標題這樣
name Y N
a 2 0
b 0 1
感謝
這些都是很多的幾種方法去做:
1)隨着庫dplyr
,你可以簡單的東西分類和計數到所需要的格式:
library(dplyr)
other %>% group_by(name) %>% summarise(N = sum(result == 'N'), Y = sum(result == 'Y'))
Source: local data frame [4 x 3]
name N Y
<fctr> <int> <int>
1 a 0 2
2 b 1 0
3 c 0 1
4 d 1 0
2)可以使用的table
和tidyr
傳播組合如下:
library(tidyr)
spread(as.data.frame(table(other)), result, Freq)
name N Y
1 a 0 2
2 b 1 0
3 c 0 1
4 d 1 0
3)你可以使用的dplyr
和tidyr
組合做如下:
library(dplyr)
library(tidyr)
spread(count(other, name, result), result, n, fill = 0)
Source: local data frame [4 x 3]
Groups: name [4]
name N Y
<fctr> <dbl> <dbl>
1 a 0 2
2 b 1 0
3 c 0 1
4 d 1 0
這裏是另一個選擇離子使用dcast
從data.table
library(data.table)
dcast(setDT(other), name~result, length)
# name N Y
#1: a 0 2
#2: b 1 0
#3: c 0 1
#4: d 1 0
雖然table(other)
將是一個緊湊的選項(從@ mtoto的評論),爲大型數據集,它可能是更有效地使用dcast
。一些基準下面給出
set.seed(24)
other1 <- data.frame(name = sample(letters, 1e6, replace=TRUE),
result = sample(c("Y", "N"), 1e6, replace=TRUE), stringsAsFactors=FALSE)
other2 <- copy(other1)
gopala1 <- function() other1 %>%
group_by(name) %>%
summarise(N = sum(result == 'N'), Y = sum(result == 'Y'))
gopala2 <- function() spread(as.data.frame(table(other1)), result, Freq)
gopala3 <- function() spread(count(other1, name, result), result, n, fill = 0)
akrun <- function() dcast(as.data.table(other2), name~result, length)
library(microbenchmark)
microbenchmark(gopala1(), gopala2(), gopala3(),
akrun(), unit='relative', times = 20L)
# expr min lq mean median uq max neval
# gopala1() 2.710561 2.331915 2.142183 2.325167 2.134399 1.513725 20
# gopala2() 2.859464 2.564126 2.531130 2.683804 2.720833 1.982760 20
# gopala3() 2.345062 2.076400 1.953136 2.027599 1.882079 1.947759 20
# akrun() 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 20
尋找'表(其他)'? – mtoto