R - 合併並更新主數據集

所以，我有兩個數據集表示舊的和當前的地址。R - 合併並更新主數據集

> main 
idspace id x y move 
    198 1238 33 4 stay 
    641 1236 36 12 move 
    1515 1237 30 28 move 

> move 
idspace id x y move 
     4 1236 4 1 move

我需要的是合併與舊（main）新數據（move）和更新main一次合併。

我想知道是否可以在一個操作？

更新基於id，這是個人標識符。

idspace,x,y是位置ID。

所以，我需要輸出爲

> main 
    idspace id x y move 
     198 1238 33 4 stay 
     4 1236 4 1 move # this one is updated 
     1515 1237 30 28 move

我不知道我怎麼能做到這一點。

喜歡的東西

merge(main, move, by = c('id'), all = T, suffixes = c('old', 'new'))

然而，這是錯誤的，因爲我需要手工做那麼多操作。

任何解決方案？

數據

> dput(main) 
structure(list(idspace = structure(c(2L, 3L, 1L), .Label = c("1515", 
"198", "641"), class = "factor"), id = structure(c(3L, 1L, 2L 
), .Label = c("1236", "1237", "1238"), class = "factor"), x = structure(c(2L, 
3L, 1L), .Label = c("30", "33", "36"), class = "factor"), y = structure(c(3L, 
1L, 2L), .Label = c("12", "28", "4"), class = "factor"), move =  structure(c(2L, 
1L, 1L), .Label = c("move", "stay"), class = "factor")), .Names = c("idspace", 
"id", "x", "y", "move"), row.names = c(NA, -3L), class = "data.frame") 

> dput(move) 
structure(list(idspace = structure(1L, .Label = "4", class = "factor"), 
id = structure(1L, .Label = "1236", class = "factor"), x = structure(1L, .Label = "4", class = "factor"), 
    y = structure(1L, .Label = "1", class = "factor"), move = structure(1L, .Label = "move", class = "factor")), .Names = c("idspace", 
"id", "x", "y", "move"), row.names = c(NA, -1L), class = "data.frame")`

來源

2016-08-22 giacomo

我認爲這是一個dup爲'tmp < - rbind（move，main）; tmp [！duplicate（tmp $ id）]，'邏輯工作得很好，假設這裏沒有其他要求。 – thelatemail

@thelatemail我正在考慮使用'sqldf'，但我不知道這個API足夠好回答。 –

@TimBiegeleisen - 也許'sqldf（」選擇COALESCE（b.idspace，a.idspace）作爲idspace， COALESCE（b.id，a.id）作爲ID， COALESCE（BX，AX）爲x， COALESCE （by，ay）as y， coalesce（b.move，a.move）as move from main a left join move b on a.id = b.id 「）' - 醜但它確實有效。 – thelatemail

使用加盟+更新的data.table特點：

require(data.table) # v1.9.6+ 
setDT(main) # convert data.frames to data.tables by reference 
setDT(move) 

main[move, on=c("id", "move"), # extract the row number in 'main' where 'move' matches 
     c("idspace", "x", "y") := .(i.idspace, i.x, i.y)] # update cols of 'main' with 
                 # values from 'i' = 'move' for 
                 # those matching rows 


main 
# idspace id x y move 
# 1:  198 1238 33 4 stay 
# 2:  4 1236 4 1 move 
# 3: 1515 1237 30 28 move

這將更新就地main。

來源

2016-08-22 01:22:04 Arun

這太棒了！每個機會有任何'dplyr'例程？ – giacomo

詢問dplyr解決方案的主要data.table開發人員...嗯... – nrussell

好的確定對不起;） - 仍然很棒的解決方案！ – giacomo

這裏有一個dplyr解決方案：

# If you want both old and new 
dplyr::full_join(main, move) 

# If you want both old and new with a suffix column 
main$suffix <- "old" 
move$suffix <- "new" 
dplyr::full_join(main, move) 

# If you want new only 
new  <- dplyr::left_join(main,move,by="id") # could also use %>% 
main[!is.na(new$move.y),1] <- new[!is.na(new$move.y),6] 
main[!is.na(new$move.y),3:4] <- new[!is.na(new$move.y),7:8]

來源

2016-08-22 01:51:44

我想我發現了一個很簡單的方法來解決這個問題，

main = as.matrix(main) 
move = as.matrix(move) 

main[main[,'id'] %in% move[,'id'], ] <- move

它匹配id，保持id有序，只改變匹配rows 。它似乎對整個數據集起作用。

來源

2016-08-22 10:06:51 giacomo

請注意，在這種情況下無法知道哪個'main $ id'與哪個'move $ id'匹配。你假設這些匹配將與'move'中的行相同。 – Arun

@你是完全正確的。但是，它似乎工作。我也嘗試了'main [，'id']％in move [，'id']，c（'idspace'，'x'，'y'，'move'）] < - move [which（move [，'id']％in％main [，'id']），c（'idspace'，'x'，'y'，'move'）]'也可以更新。在後一種情況下，這個「id」是匹配的。再次感謝您的耐心和關注！ – giacomo

'％in％'返回一個邏輯向量。它始終保留子集上輸入數據的順序。嘗試一個更復雜的例子。例如，如果'main $ id'的第1和第3項與'move $ id'的第3和第1項相匹配，則將'move'的第1和第3行分配給'main'的第1和第3行。那是錯誤的。 – Arun

R - 合併並更新主數據集

回答

相關問題