替換R中缺失值的平均值或模式

我有一個包含缺失值的混合數據類型（數字，字符，因子，序數因子）的大型數據庫，而我嘗試創建for循環以替代缺失值如果使用數字，則使用相應列的平均值，如果使用字符/因子，則使用模式。替換R中缺失值的平均值或模式

這是我到現在爲止：

#fake array: 
age<- c(5,8,10,12,NA) 
a <- factor(c("aa", "bb", NA, "cc", "cc")) 
b <- c("banana", "apple", "pear", "grape", NA) 
df_test <- data.frame(age=age, a=a, b=b) 
df_test$b <- as.character(df_test$b) 

for (var in 1:ncol(df_test)) { 
    if (class(df_test[,var])=="numeric") { 
     df_test[is.na(df_test[,var]) <- mean(df_test[,var], na.rm = TRUE) 
} else if (class(df_test[,var]=="character") { 
     Mode(df_test$var[is.na(df_test$var)], na.rm = TRUE) 
} 
}

在哪裏「模式」的功能是：

Mode <- function (x, na.rm) { 
    xtab <- table(x) 
    xmode <- names(which(xtab == max(xtab))) 
    if (length(xmode) > 1) 
     xmode <- ">1 mode" 
    return(xmode) 
}

它似乎它只是忽略了聲明雖然沒有給任何錯誤... 我也試圖用索引工作的第一部分：

## create an index of missing values 
index <- which(is.na(df_test)[,1], arr.ind = TRUE) 
## calculate the row means and "duplicate" them to assign to appropriate cells 
df_test[index] <- colMeans(df_test, na.rm = TRUE) [index["column",]]

但是我得到這個錯誤：「colMeans錯誤（df_test，na.rm = TRUE）：'x'必須是數字」

有沒有人有任何想法如何解決這個問題？

非常感謝您的幫助！ -f

來源

2011-10-11 user971102

請聲明交叉帖子 – mdsumner

此代碼有幾個語法錯誤使其無法運行。 – joran

如果你簡單地去掉明顯的錯誤，然後它按預期工作：

Mode <- function (x, na.rm) { 
    xtab <- table(x) 
    xmode <- names(which(xtab == max(xtab))) 
    if (length(xmode) > 1) xmode <- ">1 mode" 
    return(xmode) 
} 

# fake array: 
age <- c(5, 8, 10, 12, NA) 
a <- factor(c("aa", "bb", NA, "cc", "cc")) 
b <- c("banana", "apple", "pear", "grape", NA) 
df_test <- data.frame(age=age, a=a, b=b) 
df_test$b <- as.character(df_test$b) 

print(df_test) 

# age a  b 
# 1 5 aa banana 
# 2 8 bb apple 
# 3 10 <NA> pear 
# 4 12 cc grape 
# 5 NA cc <NA> 

for (var in 1:ncol(df_test)) { 
    if (class(df_test[,var])=="numeric") { 
     df_test[is.na(df_test[,var]),var] <- mean(df_test[,var], na.rm = TRUE) 
    } else if (class(df_test[,var]) %in% c("character", "factor")) { 
     df_test[is.na(df_test[,var]),var] <- Mode(df_test[,var], na.rm = TRUE) 
    } 
} 

print(df_test) 

#  age a  b 
# 1 5.00 aa banana 
# 2 8.00 bb apple 
# 3 10.00 cc pear 
# 4 12.00 cc grape 
# 5 8.75 cc >1 mode

我建議您使用與語法高亮和括號匹配編輯，這將使它更容易找到這些類型的語法錯誤。

來源

2011-10-11 23:25:10 pete

皮特，非常感謝！我試着根據你使用'gedit'的建議重寫，而且確實好多了！我正在盲目地試圖按照括號...也感謝你的更正，有大量的錯誤...很多仍然學習。謝謝！ – user971102

替換R中缺失值的平均值或模式

回答

相關問題