2016-09-27 73 views
1

我掙扎使用dplyr和tidyr採取DF這種形式:dplyr tidyr擴大和總結特定的列

myDf <- data.frame(id = c(1,1,1,1,2,2), 
        event = c('a','b','a','b','a','b'), 
        a_property = c(1,NA,2, NA, 3, NA), 
        b_property = c(NA,2,NA, 3, NA, 4)) 

> myDf 
id event a_property b_property 
1  a   1   NA 
1  b   NA   2 
1  a   2   NA 
1  b   NA   3 
2  a   3   NA 
2  b   NA   4  

,並轉換成以下所需的格式:

id count_event_a count_event_b sum_property_a sum_property_b 
1    2    2    3    5 
2    1    1    5    4 
+0

做兩個步驟。重塑像這樣的問題:http://stackoverflow.com/questions/20620492/reshape-long-to-wide-with-multiple-groupings然後總結()得到計數/總和。 – MrFlick

回答

0

多一點一般:

myDf %>% 
    gather(key, value, -id, -event) %>% 
    filter(!is.na(value)) %>% 
    group_by(id, event) %>% 
    summarise(count = n(), 
      sum = sum(value)) %>% 
    gather(key, value, -id, -event) %>% 
    unite(measure, key, event) %>% 
    spread(measure, value) 
1
myDf %>% 
    group_by(id) %>% 
    summarise(count_event_a = sum(!is.na(a_property)), 
      count_event_b = sum(!is.na(b_property)), 
      sum_property_a = sum(a_property, na.rm = TRUE), 
      sum_property_b = sum(b_property, na.rm = TRUE)) %>% 
    ungroup() 

您的示例中存在拼寫錯誤。答案應該是:

# A tibble: 2 × 5 
    id count_event_a count_event_b sum_property_a sum_property_b 
    <dbl>   <int>   <int>   <dbl>   <dbl> 
1  1    2    2    3    5 
2  2    1    1    3    4