2016-08-19 123 views
0

假設我的原始數據看起來像這樣將2個變量合併爲1?

df <- data.frame(id = 1:10, 
       V = LETTERS[1:10], 
       Treatment1 = c(rep(1,3), rep(0,7)), 
       Treatment2 = c(rep(0,3), rep(1,3), rep(0,4))) 

我想合併Treatment1Treatment2到一個新的變量,需要1 3的值:Treatment1Treatment2Control。這就是我想和這個數據幀結束:

df2 <- data.frame(id = 1:10, 
        V = LETTERS[1:10], 
        Treatment = c(rep("Treatment1",3), 
           rep("Treatment2",3), 
           rep("Control",4))) 

現在我正在使用此代碼做:

library(dplyr) 
df$Treatment <- ifelse(test = df$Treatment1==1, yes = "Treatment1", 
         no = ifelse(test = df$Treatment2==1, 
            yes = "Treatment2", no = "Control")) 

df2 <- df %>% select(-Treatment1, -Treatment2) 

有沒有更好的辦法?

+2

據我所見,這個問題與tidyr和dplyr完全無關。 –

回答

3

一種方式,最終被相當可讀和可擴展是創建一個查找表,並與您現有的數據進行合併如下:

df2 <- data.frame(Treatment1 = c(1,0,0), 
        Treatment2 = c(0,1,0), 
        Treatment = c("Control", "Treatment1", "Treatment2")); 
merge(df, df2, all.x=TRUE) #Setting all.x ensures rows of df aren't dropped if there isn't a match 

#  Treatment1 Treatment2 id V Treatment 
# 1   0   0 7 G Treatment2 
# 2   0   0 8 H Treatment2 
# 3   0   0 9 I Treatment2 
# 4   0   0 10 J Treatment2 
# 5   0   1 4 D Treatment1 
# 6   0   1 5 E Treatment1 
# 7   0   1 6 F Treatment1 
# 8   1   0 1 A Control 
# 9   1   0 2 B Control 
# 10   1   0 3 C Control 
2

我們可以做到這一點沒有任何ifelse

df$Treatment <- with(df, c("Control", "Treatment1", "Treatment2")[(Treatment1 + 
           2*Treatment2)+1]) 
df$Treatment 
#[1] "Treatment1" "Treatment1" "Treatment1" "Treatment2" "Treatment2" 
#[6] "Treatment2" "Control" "Control" "Control" "Control" 

或者另一種選擇是pmax

c("Control", "Treatment1", "Treatment2")[do.call(pmax, df[3:4]*col(df[3:4]))+1] 
#[1] "Treatment1" "Treatment1" "Treatment1" "Treatment2" "Treatment2" 
#[6] "Treatment2" "Control" "Control" "Control" "Control" 

如果需要與 'DF2',paste相比擬'df'中的第3和第4列一起,設置的名稱'df2'中的'Treatment'的3210個元素與'v1'中的獨特元素(在示例中它是以相同的順序)使用它來替換值。

v1 <- do.call(paste0, df[3:4]) 
unname(setNames(as.character(unique(df2$Treatment)), c("10", "01", "00"))[v1]) 
#[1] "Treatment1" "Treatment1" "Treatment1" "Treatment2" "Treatment2" 
#[6] "Treatment2" "Control" "Control" "Control" "Control" 

注:所有這些方法沒有使用包,應該是有效的做到這一點

2

dplyr::case_when是一個很好的替代嵌套ifelse S:

library(dplyr) 

df %>% mutate(Treatment = case_when(.$Treatment1 == 1 ~ 'Treatment1', 
            .$Treatment2 == 1 ~ 'Treatment2', 
            TRUE ~ 'Control')) %>% 
    select(-Treatment1, -Treatment2) 
    ## id V Treatment 
    ## 1 1 A Treatment1 
    ## 2 2 B Treatment1 
    ## 3 3 C Treatment1 
    ## 4 4 D Treatment2 
    ## 5 5 E Treatment2 
    ## 6 6 F Treatment2 
    ## 7 7 G Control 
    ## 8 8 H Control 
    ## 9 9 I Control 
    ## 10 10 J Control 

由於它還是新的並且有點實驗性,因此case_when需要$表示法mutatefor now,但是it looks like that will change時間太長。