使用'R'中的數據表替換NA反覆使用

我試圖用相應組中的隨機樣本替換NA。例如，在第2行中，NA來自'法國'，年齡和時間爲'20 -30''30 -40'。因此，我想隨機抽取所有其他「法國」，「20-30」，「30-40」觀察值的「響應」列樣本。使用'R'中的數據表替換NA反覆使用

我有下面的代碼很好用，但每個值被替換爲相同的隨機樣本。例如，如果我有不止一個「法國」，「20-30」，「30-40」NA，那麼它們對應的R2都是相同的。

我希望每個NA都能獨立採樣，但data.table似乎是「一次全部」完成的，因此我無法做到這一點。有任何想法嗎？

DT <- data.table(mydf, key = "Country,Age,Time") 
DT[, R2 := ifelse(is.na(Response), sample(na.omit(Response), 1), 
        Response), by = key(DT)] 
DT 
# Index Country Age Time Response R2 
# 1:  5 France 20-30 30-40  1 1 
# 2:  6 France 20-30 30-40  NA 2 
# 3:  7 France 20-30 30-40  2 2 
# 4:  1 Germany 20-30 15-20  1 1 
# 5:  2 Germany 20-30 15-20  NA 1 
# 6:  3 Germany 20-30 15-20  1 1 
# 7:  4 Germany 20-30 15-20  0 0

其中myDF上是

mydf <- structure(list(Index = 1:7, Country = c("Germany", "Germany", 
"Germany", "Germany", "France", "France", "France"), Age = c("20-30", 
"20-30", "20-30", "20-30", "20-30", "20-30", "20-30"), Time = c("15-20", 
"15-20", "15-20", "15-20", "30-40", "30-40", "30-40"), Response = c(1L, 
NA, 1L, 0L, 1L, NA, 2L)), .Names = c("Index", "Country", "Age", 
"Time", "Response"), class = "data.frame", row.names = c(NA, -7L))

來源

2014-02-09 user3154267

我會做這種方式：

DT[, is_na := is.na(Response)] 
nas <- DT[, sample(Response[!is_na], sum(is_na), TRUE) , 
      by=list(Country, Age, Time)]$V1 
DT[, R2 := Response][(is_na), R2 := nas]

來源

2014-02-09 19:30:31 Arun

set.seed(1234) 
require(data.table) 
DT <- data.table(mydf, key = "Country,Age,Time")

第一步

DT[, R2 := sample(na.omit(Response), length(Response), replace = T), 
    by = key(DT)] 

DT 

# Index Country Age Time Response R2 
# 1:  5 France 20-30 30-40  1 1 
# 2:  6 France 20-30 30-40  NA 2 
# 3:  7 France 20-30 30-40  2 2 
# 4:  1 Germany 20-30 15-20  1 1 
# 5:  2 Germany 20-30 15-20  NA 0 
# 6:  3 Germany 20-30 15-20  1 1 
# 7:  4 Germany 20-30 15-20  0 1

EDIT

第二步

在第一步中，您跨組（通過= ...）進行採樣並獲取R2的值。第二步，使用沒有NAs的Response值更新R2。

DT[!is.na(Response), R2 := Response] 

DT 

# Index Country Age Time Response R2 
# 1:  5 France 20-30 30-40  1 1 
# 2:  6 France 20-30 30-40  NA 2 
# 3:  7 France 20-30 30-40  2 2 
# 4:  1 Germany 20-30 15-20  1 1 
# 5:  2 Germany 20-30 15-20  NA 0 
# 6:  3 Germany 20-30 15-20  1 1 
# 7:  4 Germany 20-30 15-20  0 0

來源

2014-02-09 18:30:17 marbel

我不知道，但我認爲隨機抽樣應只替換NA條目...例：R2的最後的值應該還是爲0，只有NA可以是0/1。 – Arun

這不可能是正確的，因爲Arun指出第7行的最後一個值已經改變。 – user3154267

好吧，這是一個跨羣體的示例，也許你可以做到這一點，然後從R2中的響應更新非NA值。我編輯了答案。希望這可以幫助！ – marbel

使用'R'中的數據表替換NA反覆使用

回答

相關問題