2017-09-13 94 views
2

我有一個包含數字和因子變量組合的數據框。使用NA替換數據框中所有列的所有異常值

我試圖遞歸替換NA但是我在與以下錯誤

Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric 
問題都異常(3×SD)

我所用的代碼是

name = factor(c("A","B","NA","D","E","NA","G","H","H")) 
height = c(120,NA,150,170,NA,146,132,210,NA) 
age = c(10,20,0,30,40,50,60,NA,130) 
mark = c(100,0.5,100,50,90,100,NA,50,210) 
data = data.frame(name=name,mark=mark,age=age,height=height) 
data 
data[is.na(data)] <- 77777 
data.scale <- scale(data) 
data.scale[ abs(data.scale) > 3 ] <- NA 
data <- data.scale 

任何關於如何使這項工作的建議?

+1

包括[可重現的示例](http://stackoverflow.com/questions/5963269)將使其他人更容易幫助你。 – Jaap

+2

如果你正在討論異常值,那麼你的變量不應該是一個因子 –

+1

你正在一個數據框上進行數學應用,這個數據框上只包含數值。使用'data = data.frame(mark = mark,age = age,height = height)',不帶'name'列。運行代碼的其餘部分,並在末尾添加'data <-cbind(name,data)'行。 – Smich7

回答

1

這裏有一個辦法:

library(dplyr) 

# take note of order for column names 
data.names <- colnames(data) 

# scale all numeric columns 
data.numeric <- select_if(data, is.numeric) %>% # subset of numeric columns 
    mutate_all(scale)        # perform scale separately for each column 
data.numeric[data.numeric > 3] <- NA   # set values larger than 3 to NA (none in this example) 

# combine results with subset data frame of non-numeric columns 
data <- data.frame(select_if(data, function(x) !is.numeric(x)), 
        data.numeric) 

# restore columns to original order 
data <- data[, data.names] 

> data 
    name  mark   age  height 
1 A 0.20461856 -0.80009469 -1.0844636 
2 B -1.43232992 -0.55391171   NA 
3 NA 0.20461856 -1.04627767 -0.1459855 
4 D -0.61796862 -0.30772873 0.4796666 
5 E 0.04010112 -0.06154575   NA 
6 NA 0.20461856 0.18463724 -0.2711159 
7 G   NA 0.43082022 -0.7090723 
8 H -0.61796862   NA 1.7309707 
9 H 2.01431035 2.15410109   NA 

注:非數字(字符/因子/等),變量將在這種方法中,數字變量之前預訂。因此,最後一步恢復原始訂單(如果適用)。