4
我有以下問題:dplyr
當使用dplyr變異GROUP_BY()之後的數值列,它如果行僅包含一個值使用時,這是一個NaN的失敗mutate命令。
因此,如果分組列包含一個數字,它正確地歸類爲DBL,但只要有一組只有一個的NaN的一個實例,如dplyr定義組失敗作爲LGL,而所有其他組是dbl。
我的第一個(也是更一般的問題)是: 有沒有辦法告訴dplyr,當使用group_by()時總是以某種方式定義一個列?
其次,有人可以幫我對這個問題的黑客在下面的MWE解釋說:
# ERROR: This will provide the column defining error mentioned:
df <- data_frame(a = c(rep(LETTERS[1:2],4),"C"),g = c(rep(LETTERS[5:7],3)), x = c(7, 8,3, 5, 9, 2, 4, 7,8)) %>% tbl_df()
df <- df %>% group_by(a) %>% mutate_each(funs(sd(., na.rm=TRUE)),x)
df <- df %>% mutate(Winsorise = ifelse(x>2,2,x))
# NO ERROR (as no groups have single entry with NaN):
df2 <- data_frame(a = c(rep(LETTERS[1:2],4),"C"),g = c(rep(LETTERS[5:7],3)), x = c(7, 8,3, 5, 9, 2, 4, 7,8)) %>% tbl_df()
df2 <- df2 %>% group_by(a) %>% mutate_each(funs(sd(., na.rm=TRUE)),x)
# Update the Group for the row with an NA - Works
df2[9,1] <- "A"
df2 <- df2 %>% mutate(Winsorise = ifelse(x>3,3,x))
# REASON FOR ERROR: What happens for groups with one member = NaN, although we want the winsorise column to be dbl not lgl:
df3 <- data_frame(g = "A",x = NaN)
df3 <- df3 %>% mutate(Winsorise = ifelse(x>3,3,x))
絕妙的答案!它現在工作得很好..我同意,理想情況下,應該有一種方法來告訴dplyr在使用group_by時始終以相同的方式定義列 – Nick