2017-04-21 140 views
1

生成虛擬數據具有唯一值的列

MainID=c('A1','A1','B2','C1','C1','C1','D2','D2') 
HouseholdID=c('Ab1','Ab1','cb2','Ca2','cb2','cb3','Da1','db2') 
relation=c('Spouse','Spouse','Child','Spouse','Child','Mother','Brother','Spouse') 

df=data.table(MainID,HouseholdID,relation) 
head(df) 

    MainID HouseholdID relation 
1:  A1   Ab1 Spouse 
2:  A1   Ab1 Spouse 
3:  B2   cb2 Child 
4:  C1   Ca2 Spouse 
5:  C1   cb2 Child 
6:  C1   cb3 Mother 

重塑數據,我需要重塑如下這樣的數據:

期望的結果

MainID  Household1  Relation1  Household2   Relation2   Household3  Relation3 
A1    Ab1   Spouse   NA     NA     NA    NA 
B2    cb2   Child   NA     NA     NA    NA 
C1    Ca2   Spouse   cb2     Child    cb3   Mother 
D2    Da1   Brother   db2     Spouse    NA    NA  

什麼是做到這一點的最好辦法使用dplyr , reshape , tidyverse或任何其他方法/包?

回答

0

既然你已經在使用「data.table」,你可以只取唯一值,然後添加一行指示變量,最後dcast以寬幅:

library(data.table) 
dcast(unique(df)[, ind := rowid(MainID)], 
     MainID ~ ind, value.var = c("HouseholdID", "relation")) 
# MainID HouseholdID_1 HouseholdID_2 HouseholdID_3 relation_1 relation_2 relation_3 
# 1:  A1   Ab1   NA   NA  Spouse   NA   NA 
# 2:  B2   cb2   NA   NA  Child   NA   NA 
# 3:  C1   Ca2   cb2   cb3  Spouse  Child  Mother 
# 4:  D2   Da1   db2   NA Brother  Spouse   NA