2016-05-21 58 views
3

有這樣Tidyr如何傳播到發生數

other=data.frame(name=c("a","b","a","c","d"),result=c("Y","N","Y","Y","N")) 

如何使用擴展函數在tidyr或其他功能的數據幀得到的結果是或否的數作爲列標題這樣

name  Y N 
a   2 0 
b   0 1 

感謝

+2

尋找'表(其他)'? – mtoto

回答

11

這些都是很多的幾種方法去做:

1)隨着庫dplyr,你可以簡單的東西分類和計數到所需要的格式:

library(dplyr) 
other %>% group_by(name) %>% summarise(N = sum(result == 'N'), Y = sum(result == 'Y')) 
Source: local data frame [4 x 3] 

    name  N  Y 
    <fctr> <int> <int> 
1  a  0  2 
2  b  1  0 
3  c  0  1 
4  d  1  0 

2)可以使用的tabletidyr傳播組合如下:

library(tidyr) 
spread(as.data.frame(table(other)), result, Freq) 
    name N Y 
1 a 0 2 
2 b 1 0 
3 c 0 1 
4 d 1 0 

3)你可以使用的dplyrtidyr組合做如下:

library(dplyr) 
library(tidyr) 
spread(count(other, name, result), result, n, fill = 0) 
Source: local data frame [4 x 3] 
Groups: name [4] 

    name  N  Y 
    <fctr> <dbl> <dbl> 
1  a  0  2 
2  b  1  0 
3  c  0  1 
4  d  1  0 
5

這裏是另一個選擇離子使用dcastdata.table

library(data.table) 
dcast(setDT(other), name~result, length) 
# name N Y 
#1: a 0 2 
#2: b 1 0 
#3: c 0 1 
#4: d 1 0 

雖然table(other)將是一個緊湊的選項(從@ mtoto的評論),爲大型數據集,它可能是更有效地使用dcast。一些基準下面給出

set.seed(24) 
other1 <- data.frame(name = sample(letters, 1e6, replace=TRUE), 
    result = sample(c("Y", "N"), 1e6, replace=TRUE), stringsAsFactors=FALSE) 

other2 <- copy(other1) 

gopala1 <- function() other1 %>% 
          group_by(name) %>% 
          summarise(N = sum(result == 'N'), Y = sum(result == 'Y')) 
gopala2 <- function() spread(as.data.frame(table(other1)), result, Freq) 
gopala3 <- function() spread(count(other1, name, result), result, n, fill = 0) 
akrun <- function() dcast(as.data.table(other2), name~result, length) 


library(microbenchmark) 
microbenchmark(gopala1(), gopala2(), gopala3(), 
        akrun(), unit='relative', times = 20L) 
#  expr  min  lq  mean median  uq  max neval 
# gopala1() 2.710561 2.331915 2.142183 2.325167 2.134399 1.513725 20 
# gopala2() 2.859464 2.564126 2.531130 2.683804 2.720833 1.982760 20 
# gopala3() 2.345062 2.076400 1.953136 2.027599 1.882079 1.947759 20 
# akrun() 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 20