R：基於其他行

我有一個大的數據有許多重複的值在一列設置，但其餘列有我想填補缺失值的唯一值更改行R：基於其他行

id <- rep(1:3, 3:1) 
name <- c("sam", "sam", "", "mike", "", "tom") 
df<- data.frame(id, name) 

id name 
1 sam 
1 sam 
1  
2 mike 
2  
3 tom

。

由於原始數據的性質，id和name字段都是因素（〜2000個唯一ID值，acros 45000行）。我想填寫缺失值的基礎上

我試過unique()和duplicated()，但有替換的困難。我想使用基礎包，如果可取的話。

謝謝！

來源

2013-10-14 ano

按照Ananda Mahto的建議使用na.locf是一個很好的解決方案。如果你想留在基地R，你可以這樣做：

> udf<-unique(df) 
> udf<-udf[udf$name != "",] 
> df$name<-udf$name[match(df$id,udf$id)] 
> df 
    id name 
1 1 sam 
2 1 sam 
3 1 sam 
4 2 mike 
5 2 mike 
6 3 tom

上編輯：如果你有大量的數據，match將是低效的。在這種情況下，如果你能保證在df的id列進行排序，然後findInterval是一個更好的選擇：

df$name<-udf$name[findInterval(df$id,udf$id)]

事實上，即使id是沒有排序，我會建議先排序，然後使用findInterval。

來源

2013-10-14 17:09:03 mrip

非常感謝！ – ano

您可以從「動物園」包嘗試na.locf：

library(zoo) 
df$name[df$name == ""] <- NA 
na.locf(df) 
# id name 
# 1 1 sam 
# 2 1 sam 
# 3 1 sam 
# 4 2 mike 
# 5 2 mike 
# 6 3 tom

堅持在基礎R，你也可以嘗試aggregate和merge：

merge(df, aggregate(as.character(name) ~ id, df, function(x) unique(x[x != ""]))) 
# id name as.character(name) 
# 1 1 sam    sam 
# 2 1 sam    sam 
# 3 1      sam 
# 4 2 mike    mike 
# 5 2     mike 
# 6 3 tom    tom

下一個步驟將是刪除原來的「名稱」列並重命名新創建的列。

來源

2013-10-14 16:55:51 A5C1D2H2I1M1N2O1R2T1

，你可以嘗試使用AVE功能

df$name = ave(df$name, df$id,FUN = function(x) unique(x[x!=""]))

來源

2013-10-14 18:33:14 amit

R：基於其他行

回答

相關問題