2016-05-22 64 views
0

我有一個數據幀列出總的學生(STU)與學生每組(ID)誰正在參與的活動(子)的數目:COUNTIF等效在dplyr總結

 ID Stu Sub 
    (int) (int) (int) 
1 101 80 NA 
2 102 130 NA 
3 103 10 NA 
4 104 210 20 
5 105 180 NA 
6 106 150 NA 

我想知道組的大小帶(> 400,> 200,> 100,> 0)的數量誰不是參與一種活動(子> 0),或不(子is.na)

output <- structure(list(ID = c(101L, 102L, 103L, 104L, 105L, 106L), 
         Stu = c(80L, 130L, 10L, 210L, 180L, 150L), 
         Sub = c(NA,NA, NA, 20L, NA, NA)), 
        .Names = c("ID", "Stu", "Sub"), 
        class = c("tbl_df", "data.frame"), 
        row.names = c(NA, -6L)) 

temp <- output %>% 
mutate(Stu = ifelse(Stu >= 400, 400, 
     ifelse(Stu >= 200, 200, 
      ifelse(Stu >= 100, 100, 0 
       )))) %>% 
group_by(Stu) %>% 
summarise(entries = length(!is.na(Sub)), 
      noentries = length(is.na(Sub))) 

的結果應該是:

Stu entries noentries 
    (dbl) (int)  (int) 
1  0  0   2 
2 100  0   3 
3 200  1   0 

,但我得到:

Stu entries noentries 
    (dbl) (int)  (int) 
1  0  2   2 
2 100  3   3 
3 200  1   1 

我怎樣才能使長度功能在總結起來就像COUNTIF?

+0

某事錯在你最後ifel se –

+0

對不起,錯過了0,現在應該工作 – pluke

+0

'sum'是正確的解決方案,如下所述。爲了清楚起見,長度返回它提供的向量的長度。在這種情況下,無論真/假值如何,長度函數都會返回每個組中的項目數。 – Gopala

回答

1

summarise要求單一值,所以sum代替length這項工作:

output %>% 
    mutate(Stu = ifelse(Stu >= 400, 400, 
         ifelse(Stu >= 200, 200, 
          ifelse(Stu >= 100, 100, 0 
          )))) %>% 
    group_by(Stu) %>% 
    summarise(entries = sum(!is.na(Sub)), 
      noentries = sum(is.na(Sub))) 

Source: local data frame [3 x 3] 

Stu entries noentries 
(dbl) (int)  (int) 
1  0  0   2 
2 100  0   3 
3 200  1   0 
+0

啊是的,我忘了is.na返回一個布爾向量,可以加總 – pluke

1

另一種選擇是組由兩個StuSub,但要做到這一點,我們首先需要重新編寫的Sub值和Stu以匹配我們想要的輸出分組。我們還使用cut,而不是嵌套ifelse,設定值斷裂處Stu

library(reshape2) 

output %>% 
    group_by(Sub=ifelse(is.na(Sub), "No Entries", "Entries"), 
      Stu=cut(Stu, c(0,100,200,400,Inf), labels=c(0,100,200,400))) %>% 
    tally %>% 
    dcast(Stu ~ Sub, fill=0) 
 Stu Entries No Entries 
1  0  0   2 
2 100  0   3 
3 200  1   0 
3

繼@ eipi10提供同樣的想法,但切正題與count()代替group_by() %>% tally()並表示tidyr::spread可以模仿reshape2::dcast

output %>% 
    count(Sub = ifelse(is.na(Sub), 'No Entries', 'Entires'), 
     Stu = cut(Stu, c(0, 100, 200, 400, +Inf), labels = c(0, 100, 200, 400))) %>% 
    tidyr::spread(Sub, n, fill = 0)