在R中使用合併來填充數據中的NA。

我有一個數據幀a，表示缺少一些單元的信息，之後我收集了缺失的數據並創建了另一個數據幀b。在R中使用合併來填充數據中的NA。

我通常填充缺失的數據通過下面的代碼：

for (loop.b in (1:nrow(b))) 
    {a[a[,"uid"]==b[loop.b,"uid"],"var1"] <- b[loop.b,"var1"] 
    }

這對我的作品OK，但如果b是有大量的行？然後，顯式循環會使進程變慢。有沒有更好的方法來做這種「缺少數據替換」的工作？

謝謝。

來源

2011-03-07 lokheart

看看'norm'包和'prelim.norm'函數。 'Hmisc'具有良好的插補功能，更不用說'mi'了...... CRAN包列表是一個很好的開始。 – aL3xa 2011-03-07 07:15:49

哦，順便說一句，擺脫那個討厭的循環... =） – aL3xa 2011-03-07 07:34:17

我可能是密集的，你可以通過發佈一個可重複的小例子來幫助我嗎？ – 2011-03-07 07:42:53

我認爲你想match，但很難猜測你的數據是什麼樣的。

## a's var1 has some missing values 
a <- data.frame(var1 = c(1, NA, 4.5, NA, 6.5), uid = 5:1) 
## b knows all about them 
b <- data.frame(var1 = c(2.3, 8.9), uid = c(2, 4)) 

## find the indexes in a$uid that match b$uid 
ind <- match(b$uid, a$uid) 

## those indexes now can be filled directly with b$uid 
a$var1[ind] <- b$var1

即使uids不是唯一的（儘管名稱的種類暗示它們是），這種方法仍然可行。

來源

2011-03-07 09:04:09 mdsumner

假設以下兩種數據幀類似於你描述：

R> a <- data.frame(uid=1:10,var1=c(1:3,NA,5:7,NA,9:10)) 
R> a 
    uid var1 
1 1 1 
2 2 2 
3 3 3 
4 4 NA 
5 5 5 
6 6 6 
7 7 7 
8 8 NA 
9 9 9 
10 10 10 

R> b <- data.frame(uid=c(8,4),var1=c(74,82)) 
R> b 
    uid var1 
1 8 74 
2 4 82

那麼你可以直接使用以下命令：

R> a[b$uid,"var1"] <- b$var1

其中給出：

來源

2011-03-07 09:04:31 juba

這作品：

# matches of a$uid in b$uid, NA if not match 
ind = match(a$uid, b$uid) 
# 'ind' are the index in b and NA, we remove the latter 
a[!is.na(ind),"var1"] = b[ind[!is.na(ind)],"var1"]

來源

2013-06-27 22:28:58

在R中使用合併來填充數據中的NA。

回答

相關問題