2017-03-08 23 views
1
library(tidyr) 
library(dplyr) 
library(tidyverse) 

下面是簡單數據框的代碼。我有一些混亂的數據,導出的列因子類別分佈在不同的列中。通過參考類似的列名將多列與Tidyr的聯合使用

Client<-c("Client1","Client2","Client3","Client4","Client5") 
Sex_M<-c("Male","NA","Male","NA","Male") 
Sex_F<-c(" ","Female"," ","Female"," ") 
Satisfaction_Satisfied<-c("Satisfied"," "," ","Satisfied","Satisfied") 
Satisfaction_VerySatisfied<-c(" ","VerySatisfied","VerySatisfied"," "," ") 
CommunicationType_Email<-c("Email"," "," ","Email","Email") 
CommunicationType_Phone<-c(" ","Phone ","Phone "," "," ") 
DF<-data_frame(Client,Sex_M,Sex_F,Satisfaction_Satisfied,Satisfaction_VerySatisfied,CommunicationType_Email,CommunicationType_Phone) 

我想用tidyr的「團結」將這些類別重新組合成單​​列。

DF<-DF%>%unite(Sat,Satisfaction_Satisfied,Satisfaction_VerySatisfied,sep=" ")%>% 
unite(Sex,Sex_M,Sex_F,sep=" ") 

不過,我必須寫多個「團結」行,我覺得這違反了三次規則,所以必須有一種方法,使這更容易,尤其是因爲我真正的數據包含幾十個需要列合併。是否有一種方法可以使用「統一」一次,但不知何故指的是匹配列名,以便所有相似的列名(例如,包含「Sex」爲「Sex_M」和「Sex_F」,以及「CommunicationType」爲「CommunicationType_Email」和「CommunicationType_Phone」)與上面的公式結合?

我也在想一個允許我輸入列名的函數,但這對我來說太難了,因爲它涉及複雜的標準評估。

+0

'DF%>%UNITE(週六,包含( 「SAT」))'? – Nate

+0

'DF%>%unite(星期六,匹配(「^星期六」))' – akrun

回答

1

我們可以使用unite

library(tidyverse) 
DF %>% 
    unite(Sat, matches("^Sat")) 

對於多個的情況下,也許

gather(DF, Var, Val, -Client, na.rm = TRUE) %>% 
     separate(Var, into = c("Var1", "Var2")) %>% 
     group_by(Client, Var1) %>% 
     summarise(Val = paste(Val[!(is.na(Val)|Val=="")], collapse="_")) %>% 
     spread(Var1, Val) 
# Client CommunicationType Satisfaction Sex 
#* <chr>    <chr>   <chr> <chr> 
#1 Client1    Email  Satisfied Male 
#2 Client2    Phone VerySatisfied Female 
#3 Client3    Phone VerySatisfied Male 
#4 Client4    Email  Satisfied Female 
#5 Client5    Email  Satisfied Male 
+1

謝謝,多個案件asnwer很好! – Mike

0

是這樣的嗎?如果你有很多列。

result<-with(new.env(),{ 
    Client<-c("Client1","Client2","Client3","Client4","Client5") 
    Sex_M<-c("Male","NA","Male","NA","Male") 
    Sex_F<-c(" ","Female"," ","Female"," ") 
    Satisfaction_Satisfied<-c("Satisfied"," "," ","Satisfied","Satisfied") 
    Satisfaction_VerySatisfied<-c(" ","VerySatisfied","VerySatisfied"," "," ") 
    CommunicationType_Email<-c("Email"," "," ","Email","Email") 
    CommunicationType_Phone<-c(" ","Phone ","Phone "," "," ") 
    x<-ls() 
    categories<-unique(sub("(.*)_(.*)", "\\1", x)) 
    df<-setNames(data.frame(lapply(x, function(y) get(y))), x) 
    for(nm in categories){ 
    df<-unite_(df, nm, x[contains(vars = x, match = nm)]) 
    } 
    return(df) 
}) 

Client CommunicationType Satisfaction  Sex 
1 Client1   Email_  Satisfied_  _Male 
2 Client2   _Phone _VerySatisfied Female_NA 
3 Client3   _Phone _VerySatisfied  _Male 
4 Client4   Email_  Satisfied_ Female_NA 
5 Client5   Email_  Satisfied_  _Male