2017-08-27 33 views
0

我想連線合併。我有一個數據集df及相應ID列表ID分配多個ID到名稱

ID <- data.frame(Alphabet = c("A", "A","A","B", "B", "C"), 
      Value = c(101,102, 103,201,202,301)) 

df <- data.frame(Name = c("A", "A","B", "C")) 

我想合併/分配ID給df 並獲得了DF看起來像

Name ID1 ID2 ID3 
A  101 102 103 
A  101 102 103 
B  201 202 
C  301 

回答

1

我會通過準備清單解決這個問題包括最終數據框的行,然後將它們「綁定」在一起。唯一的技巧是計算行的最大長度並相應地添加NAs。這應該工作。

ID <- data.frame(Alphabet = c("A", "A","A","B", "B", "C"), 
       Value = c(101,102, 103,201,202,301)) 

df <- data.frame(Name = c("A", "A","B", "C")) 


tmp <- lapply(df$Name, (function(id){ 
    ID[ID$Alphabet == id, ]$Value 
})) 
max.el <- max(sapply(tmp, length)) 
out.df <- do.call(rbind, lapply(tmp, (function(el){ 
    len.na <- max.el - length(el) 
    c(el, rep(NA, len.na)) 
}))) 

print(out.df, na.print = "") 

這是結果

 [,1] [,2] [,3] 
[1,] 101 102 103 
[2,] 101 102 103 
[3,] 201 202  
[4,] 301  

如果顯示設備上沒有問題,那麼

colnames(out.df) <- paste("ID", c(1:max.el), sep = "") 
out.df <- cbind(df, out.df) 
out.df 

    Name ID1 ID2 ID3 
1 A 101 102 103 
2 A 101 102 103 
3 B 201 202 NA 
4 C 301 NA NA 
2

試試這個?使用NA比空好注意缺失值〜

如果確實想'',而不是NA僅僅使用outdf[is.na(outdf)]=''

library(dplyr) 
ID=ID%>%group_by(Alphabet)%>%mutate(ID=row_number()) 
library(reshape2) 
DF=as.data.frame(acast(ID, Alphabet~ID, value.var="Value")) 
DF$Name=row.names(DF) 
merge(df,DF,by='Name') 


    Name 1 2 3 
1 A 101 102 103 
2 A 101 102 103 
3 B 201 202 NA 
4 C 301 NA NA 

或使用tidyr(推薦〜因爲你與data.frame工作)

library(dplyr) 
library(tidyr) 
ID=ID%>%group_by(Alphabet)%>%mutate(id=row_number()) 
DF=spread(ID, id,Value) 
merge(df,DF,by.x='Name',by.y='Alphabet') 

    Name 1 2 3 
1 A 101 102 103 
2 A 101 102 103 
3 B 201 202 NA 
4 C 301 NA NA 
0

爲了完整起見,這裏還有一個解決方案,使用data.table包中的dcast()來重塑長,以寬格式和右連接

library(data.table) 
# coerce to data.table 
setDT(D)[ 
    # reshape from long to wide, thereby creating column names 
    , dcast(.SD, Alphabet ~ rowid(Alphabet, prefix = "ID"))][ 
    # rename column 
    , setnames(.SD, "Alphabet", "Name")][ 
     # right join with df to repeat rows 
     setDT(df), on = "Name"] 
Name ID1 ID2 ID3 
1: A 101 102 103 
2: A 101 102 103 
3: B 201 202 NA 
4: C 301 NA NA 

萬一NA不能所示,輸出需要被轉換爲類型字符:

setDT(D)[, dcast(.SD, Alphabet ~ rowid(Alphabet, prefix = "ID"), as.character, fill = "")][ 
    , setnames(.SD, "Alphabet", "Name")][ 
     setDT(df), on = "Name"] 
Name ID1 ID2 ID3 
1: A 101 102 103 
2: A 101 102 103 
3: B 201 202  
4: C 301