2016-04-03 64 views
1

我已經閱讀了很多類似於這個的問題,但沒有一個類似於我的答案。我很抱歉,如果這是多餘的,我只是看不到它。用另一個數據框填充NAs,兩個id變量

我有一個主數據集和一個備份數據集。當主用戶有NA時,我想查看備份,如果有與full.place.name和Year匹配的值,我想用該值替換NA。

primary

Year Firearm.Homicide Firearm.Suicide Firearm.Unintentional full.place.name 
2010    0   <NA>      0 Adair County, KY 
2010    10    19     <NA> Adams County, CO 

backup

Year Firearm.Homicide Firearm.Suicide Firearm.Unintentional full.place.name 
2010    NA    1      1 Adair County, KY 
2010    NA    NA      0 Adams County, CO 

我要的是

Year Firearm.Homicide Firearm.Suicide Firearm.Unintentional full.place.name 
2010    0    1      0 Adair County, KY 
2010    10    19      0 Adams County, CO 

我已經試過

library(data.table) 
setDT(primary); setDT(backup) 
primary[is.na(primary$Firearm.Homicide), primary$Firearm.Homicide := backup[backup, primary$Firearm.Homicide, on=c("Year", "full.place.name")]] 

但是,最後添加了五列,並沒有得到任何正確的值。我也嘗試了ifelse語句和FillIn,我從來沒有接近過。這裏有五行數據:

primary<-structure(list(Year = c(2010, 2010, 2010, 2010, 2010), 
       Firearm.Homicide = c("0","10", "4", "3", NA), Firearm.Suicide = c(NA,"19", "5", "6", 
       NA), Firearm.Unintentional = c("0", NA, NA, "0", "0"), full.place.name = c("Adair County, KY", 
       "Adams County, CO", "Adams County, MS", "Adams County, PA", "Adams County, WI" 
      )), .Names = c("Year", "Firearm.Homicide", "Firearm.Suicide", 
       "Firearm.Unintentional", "full.place.name"), row.names = c(NA, 
       5L), class = "data.frame") 

backup<-structure(list(Year = c(2010, 2010, 2010, 2010, 2010), Firearm.Homicide = c(NA, 
      NA, 4, 3, 3), Firearm.Suicide = c(1, NA, NA, NA, NA), Firearm.Unintentional = c(1, 
      0, 1, NA, NA), full.place.name = c("Adair County, KY", "Adams County, CO", 
      "Adams County, MS", "Adams County, PA", "Adams County, WI")), .Names = c("Year", 
      "Firearm.Homicide", "Firearm.Suicide", "Firearm.Unintentional", 
      "full.place.name"), row.names = c(NA, 5L), class = "data.frame") 

我真的很感謝任何幫助!

回答

2

如果兩個數據幀總是與指定的結構相同,那麼有一個直接的解決方案。你可以這樣做: primary[is.na(primary)] <- backup[is.na(primary)]如果表中的元素已經事先映射到彼此。這是一種使用dplyr包假設您的鍵列是「Year」和「full.place.name」來排序數據。

library(dplyr) primary <- arrange(primary, Year, full.place.name) %>% select(Year, Firearm.Homicide,Firearm.Suicide, Firearm.Unintentional, full.place.name) backup <- arrange(backup, Year, full.place.name) %>% select(Year, Firearm.Homicide, Firearm.Suicide, Firearm.Unintentional, full.place.name)

它可能不是這樣做的最佳方式,但它很容易理解。

+0

他們不是互相映射現在,我怎麼能做到這一點? – user5457414

+0

您可以首先按鍵列對兩個數據框進行排序,具體取決於它們是什麼,我猜這裏應該是「Year」和「full.place.name」? – Psidom

0

data.table的一個選項將使用set。 「主」中的「火器」列爲character類,而「備份」中的相應列爲numeric。因此,我們需要將「主」中的那些列的class更改爲numeric,然後將「主」中的「槍支」列中的NA值分配給「備份」中的相應值。

加入on後,我們可以遍歷「火器」列,將列轉換爲「數字」,將「NA」替換爲「i」列中的相應值,最後將「i」列爲NULL。

#joining step 
dt <- setDT(primary)[backup, on = c("Year", "full.place.name")] 
#identify the Firearm columns with `grep` 
nm1 <- grep("^Firearm", names(primary), value=TRUE) 
#create a corresponding "i." column names vector from nm1 
nm2 <- paste0("i.", nm1) 
#loop through the columns 
for(j in seq_along(nm1)){ 
    #convert the Firearm columns from primary to `numeric` 
    set(dt, i = NULL, j= nm1[j], value = as.numeric(dt[[nm1[j]]])) 
    #replace the NA with corresponding values from "i" columns 
    set(dt, i = which(is.na(dt[[nm1[j]]])), j = nm1[j], 
     value = dt[[nm2[j]]][is.na(dt[[nm1[j]]])]) 
    #remove the i columns by assigning it to NULL 
    set(dt, i = NULL, j= nm2[j], value = NULL) 
} 


dt 
# Year Firearm.Homicide Firearm.Suicide Firearm.Unintentional full.place.name 
#1: 2010    0    1      0 Adair County, KY 
#2: 2010    10    19      0 Adams County, CO 
#3: 2010    4    5      1 Adams County, MS 
#4: 2010    3    6      0 Adams County, PA 
#5: 2010    3    NA      0 Adams County, WI 
0

假設你的數據集進行排序相同,所有的名稱是相同的(根據你的榜樣),然後

primary[is.na(primary)] <- backup[is.na(primary)] 
primary 
# Year Firearm.Homicide Firearm.Suicide Firearm.Unintentional full.place.name 
#1 2010    0    1      0 Adair County, KY 
#2 2010    10    19      0 Adams County, CO 
#3 2010    4    5      1 Adams County, MS 
#4 2010    3    6      0 Adams County, PA 
#5 2010    3   <NA>      0 Adams County, WI 
相關問題