考慮以下的數據幀:爲什麼嵌套的ifelse在dplyr 0.5.0 mutate中創建不正確的結果?
(tmp_df <-
structure(list(class = c(0L, 0L, 1L, 1L, 2L, 2L), logi = c(TRUE,
FALSE, TRUE, FALSE, TRUE, FALSE), val = c(1, 1, 1, 1, 1, 1),
taken = c(1.00684931506849, 0.993197278911565, 1.025, 0.975609756097561,
1.00826446280992, 0.991803278688525)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -6L), .Names = c("class",
"logi", "val", "taken")))
其產生:
Source: local data frame [6 x 4]
class logi val taken
<int> <lgl> <dbl> <dbl>
1 0 TRUE 1 1.0068493
2 0 FALSE 1 0.9931973
3 1 TRUE 1 1.0250000
4 1 FALSE 1 0.9756098
5 2 TRUE 1 1.0082645
6 2 FALSE 1 0.9918033
我想組按類別,並且如果每個組包含兩個成員,然後從val
如果logi == FALSE
減去1,否則,從val
中減去該組中的最小值taken
。如果每個組不包含兩個成員,那麼我們從val
減去零。
使用dplyr
包做代碼以上可使用來表示:
tmp_df %>%
group_by(class) %>%
mutate(taken_2 = ifelse(n() != 2, 0,
ifelse(logi, min(taken), 1)),
not_taken = val - taken_2)
然而,這將產生不正確的結果,其中,由所述第二ifelse
總是解析爲第一條件:
Source: local data frame [6 x 6]
Groups: class [3]
class logi val taken taken_2 not_taken
<int> <lgl> <dbl> <dbl> <dbl> <dbl>
1 0 TRUE 1 1.0068493 0.9931973 0.006802721
2 0 FALSE 1 0.9931973 0.9931973 0.006802721
3 1 TRUE 1 1.0250000 0.9756098 0.024390244
4 1 FALSE 1 0.9756098 0.9756098 0.024390244
5 2 TRUE 1 1.0082645 0.9918033 0.008196721
6 2 FALSE 1 0.9918033 0.9918033 0.008196721
如果我們沒有第一個ifelse
聲明,可以生成正確的結果。
tmp_df %>%
group_by(class) %>%
mutate(taken_2 = ifelse(logi, min(taken), 1),
not_taken = val - taken_2)
生產:
Source: local data frame [6 x 6]
Groups: class [3]
class logi val taken taken_2 not_taken
<int> <lgl> <dbl> <dbl> <dbl> <dbl>
1 0 TRUE 1 1.0068493 0.9931973 0.006802721
2 0 FALSE 1 0.9931973 1.0000000 0.000000000 # correct!
3 1 TRUE 1 1.0250000 0.9756098 0.024390244
4 1 FALSE 1 0.9756098 1.0000000 0.000000000 # correct!
5 2 TRUE 1 1.0082645 0.9918033 0.008196721
6 2 FALSE 1 0.9918033 1.0000000 0.000000000 # correct!
我們可以看到,這個問題似乎通過檢查其他代碼片段,成功地做類似的東西被隔離到mutate
和嵌套ifelse
:
tmp_df %>%
group_by(class) %>%
mutate(taken_2 = ifelse(n() != 3, 0,
ifelse(logi, min(taken), 1)),
not_taken = val - taken_2)
tmp_df_2 <-
tmp_df %>%
filter(row_number() <= 2)
(tmp_df_2$taken_2 <-
ifelse(c(0, 0), 0,
ifelse(tmp_df_2$logi, min(tmp_df_2$taken), 1)))
## but the following does not work (checks problem is not to do with grouping)
# tmp_df_2 %>%
# mutate(taken_2 = ifelse(n() != 2, 0,
# ifelse(logi, min(taken), 1)),
# not_taken = val - taken_2)
爲什麼會發生這種情況,並且如何獲得預期的行爲?一種解決方法是嵌套ifelse
邏輯分割成多個在線變異:
tmp_df %>%
group_by(class) %>%
mutate(taken_2 = ifelse(n() != 2, 0, 1),
taken_3 = taken_2 * ifelse(logi, min(taken), 1),
not_taken = val - taken_3)
別人已經確定與嵌套ifelse了類似的問題,但我不知道是否有同根: ifelse using dplyr results in NAs for some records
感謝您的建議。在這些函數變得可用之前,「ifelse」是進行條件變異的唯一方法。 – Alex
確實。但是現在'if_else'可用,你應該使用它 - 並感激它是如此挑剔! – Hugh
另外,我對@Weihuang發表過同樣的評論,你怎麼知道第一個'ifelse'的結果是長度? – Alex