2014-09-30 20 views
-1

您好我寫了一個函數,從與他們的平均每列推諉的NA:功能由列人來港通過它來推諉的平均

df1<-data.frame(c=(1:5), d=(11:15), f=c(1,NA, 2:4), e=c(1,0,1,0,1), g=c(1,NA,2,36,7)) 

reemp<-function (tbl) { 
    var_incom<-colnames(tbl)[ !complete.cases(t(tbl))] 
    for (col in var_incom) { 
    tbl$col[is.na(tbl$col)] <-median(tbl$col, na.rm=TRUE)} 
    return(tbl)} 


reemp(df1) 

但我得到一個warining消息並沒有結果:

Warning messages: 
1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' 
2: In is.na(tbl$col) : 
    is.na() applied to non-(list or vector) of type 'NULL' 
3: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' 
4: In is.na(tbl$col) : 
    is.na() applied to non-(list or vector) of type 'NULL' 
+2

問題是山坳的(非)評價,當你使用'$'(見' '$''或' '[''的資格嗎?細節),即R正在尋找名爲「col」而不是「f」和「g」的列。但是你可以用'tbl [[col]] [is.na(tbl [[col]])] - (tbl [[col]],na.rm = TRUE)來替換函數中的第四行。 ' – rawr 2014-09-30 18:12:50

回答

1

嘗試:

df1[] <- lapply(df1, function(x) replace(x, is.na(x), median(x, na.rm=TRUE))) 

如果你有很多欄目,它可能是有效只做工藝上列有至少一個NA

nm1 <- names(df1)[unlist(lapply(df1, anyNA))] 
#or nm1 <- names(df1)[colSums(is.na(df1))>0] 

df1[nm1] <- lapply(df1[nm1], function(x) replace(x, is.na(x), median(x,na.rm=TRUE))) 

library(matrixStats) 
df1[is.na(df1)] <- colMedians(as.matrix(df1), 
       na.rm=TRUE)[which(is.na(df1), arr.ind=TRUE)[,2]] 
1

我更換TBL $山坳與TBL [山口]並工作。

reemp<-function (tbl) { 
    x <- data.frame(x=1) 
    var_incom<-colnames(tbl)[ !complete.cases(t(tbl))] 
    for (col in var_incom) { 
    tbl[,col][is.na(tbl[,col])] <-median(tbl[,col], na.rm=TRUE) 
    } 
    return(tbl)} 
0

下面應該工作:

df1 
    c d f e g 
1 1 11 1 1 1 
2 2 12 NA 0 NA 
3 3 13 2 1 2 
4 4 14 3 0 36 
5 5 15 4 1 7 

meds = sapply(df1, median, na.rm=T) 
meds 
    c d f e g 
3.0 13.0 2.5 1.0 4.5 

for(i in 1:ncol(df1)) { 
    vect = df1[,i]; 
    vect[is.na(vect)]=meds[i]; 
    df1[,i] = vect 
} 
df1 
    c d f e g 
1 1 11 1.0 1 1.0 
2 2 12 2.5 0 4.5 
3 3 13 2.0 1 2.0 
4 4 14 3.0 0 36.0 
5 5 15 4.0 1 7.0