2017-08-29 41 views
2
A <- data.frame(Item_A = c("00EF", "00EF", "00EF", "00EF", "00EF", "00FR", "00FR"), 
       Item_B = c(NA, NA, NA, NA, "JAMES RIVER", NA, NA)) 

B <- data.frame(Item_A = c("00EF", "00EF", "00EF", "00FR", "00FR"), 
       Item_B = c("JAMES RIVER", NA, "JAMES RIVER", 
          "RICE MIDSTREAM", "RICE MIDSTREAM")) 

預計:使用其他數據填寫缺失值?

A <- data.frame(Item_A = c("00EF", "00EF", "00EF", "00EF", "00EF", "00FR", "00FR"), 
       Item_B = c("JAMES RIVER", "JAMES RIVER", "JAMES RIVER", 
         "JAMES RIVER", "JAMES RIVER", "RICE MIDSTREAM", "RICE MIDSTREAM")) 

B <- data.frame(Item_A = c("00EF", "00EF", "00EF", "00FR", "00FR"), 
       Item_B = c("JAMES RIVER", "JAMES RIVER", "JAMES RIVER", 
          "RICE MIDSTREAM", "RICE MIDSTREAM")) 

我必須根據其他行,其中Item_A相同的Item_B填寫項目Item_B。例如,數據集AItem_B的第一個到第四個觀察值需要成爲「JAMES RIVER」。

您能否建議一種方法來填寫R中缺失的值?我嘗試了很多技巧,但無法得到我想要的。

+0

'動物園:: na.locf(A $ Item_B,fromLast = TRUE) '? – Jaap

+0

請加預期輸出 – Sotos

+0

好的,謝謝提醒 – YXCHEN

回答

3

據我理解了問題,這是不只是行使簡單地在每個data.frame的一列填補缺失值。我相信這需要填寫的Item_B屬於Item_A利用查找或映射表的幫助值:

library(data.table) 
# create mapping table from both data.frames 
map <- unique(rbindlist(list(A, B)))[!is.na(Item_B)] 
# or, in case there are additional columns besides Item_A and Item_B 
map <- unique(rbindlist(list(A, B))[!is.na(Item_B), .(Item_A, Item_B)]) 
map 
Item_A   Item_B 
1: 00FF JAMES RIVER 
2: 00EF JAMES RIVER 
3: 00FR RICE MIDSTREAM 
# join and replace 
setDT(A)[map, on = c("Item_A"), Item_B := i.Item_B][] 
Item_A   Item_B 
1: 00FF JAMES RIVER 
2: 00FF JAMES RIVER 
3: 00FF JAMES RIVER 
4: 00FF JAMES RIVER 
5: 00FF JAMES RIVER 
6: 00FR RICE MIDSTREAM 
7: 00FR RICE MIDSTREAM 
setDT(B)[map, on = c("Item_A"), Item_B := i.Item_B][] 
Item_A   Item_B 
1: 00EF JAMES RIVER 
2: 00EF JAMES RIVER 
3: 00EF JAMES RIVER 
4: 00FR RICE MIDSTREAM 
5: 00FR RICE MIDSTREAM 

期間加入,有兩列命名Item_B,一個從第一數據表格,A(或B,RESP)和從第二數據表格map另一個。爲了區分它們,前綴i.表示i.Item_B應取自map

+0

對不起,爲什麼你的代碼不工作? – YXCHEN

+0

你好Uwe,我喜歡你的代碼。但是我嘗試了你的代碼,在A和B數據集中NAs仍然是一樣的。它不能像你寫的那樣輸出期望的代碼。 – YXCHEN

+0

@YXCHEN請再試一次,我已更正了代碼。顯然,在發佈之前,我一直非常渴望拋光代碼。 – Uwe

1

你可以嘗試tidyr庫輔助fill

library(tidyr) 
A %>% 
    tidyr::fill(Item_B, .direction = "down") %>% 
    tidyr::fill(Item_B, .direction = "up") 

    Item_A  Item_B 
1 00FF JAMES RIVER 
2 00FF JAMES RIVER 
3 00FF JAMES RIVER 
4 00FF JAMES RIVER 
5 00FF JAMES RIVER 
6 00FR JAMES RIVER 
7 00FR JAMES RIVER 
+2

您好頭足類,謝謝。但是「00FR」=「」RICE MIDSTREAM「」不是「JAMES RIVER」 – YXCHEN

2

您可以嘗試創建一個字典數據框。

library(dplyr) 
dictionnary <- bind_rows(A,B) %>% 
      filter(!is.na(Item_B)) %>% 
      distinct 
find_name <- function(id){ 
    name <- dictionnary[["Item_B"]][which(dictionnary[["Item_A"]]==id)] 
    return(name) 
} 
test_id <- c("00EF","00EF","00EF","00FR","00FR") 
new_names <- sapply(test_id ,find_name) 

然後,您可以聲明你的數據框:根據您的輸入

New_A <- data.frame(Item_A=c("00FF","00FF","00FF","00FF","00FF","00FR","00FR"), 
       Item_B=sapply(c("00FF","00FF","00FF","00FF","00FF","00FR","00FR"),find_name)) 

New_B <- data.frame(Item_A=c("00EF","00EF","00EF","00FR","00FR"), 
       Item_B=sapply(c("00EF","00EF","00EF","00FR","00FR"),find_name)) 
+0

嗨巴卡拉,謝謝。但我使用的代碼。這是一個錯誤。錯誤在lapply(X = X,FUN = FUN,...):object'new_names'找不到 – YXCHEN

+0

謝謝,我已經解決了這個問題;) –

+0

感謝您的幫助 – YXCHEN

0

@YXCHEN更新

lookup_df <- unique(rbindlist(list(A, B)))[!is.na(Item_B)] 

left_join(A %>% select(Item_A), lookup_df) 
left_join(B %>% select(Item_A), lookup_df) 
+0

您好頭足類,謝謝。你比我更擅長代碼。 – YXCHEN

+0

但是,如果我想處理很多不同的「x」.... – YXCHEN

+1

@cephalopod你的答案是混合來自'data.table'和'dplyr'的函數,而不提及包。 – Uwe