Cars MPG
Ford 12
Toyota 20
Honda 18
Ford 15
Ford 17
Toyota 24
Ford NA
Ford NA
,所以我想與福特的MPG如何用R中該列的某個類別的平均值替換缺失值?
Cars MPG
Ford 12
Toyota 20
Honda 18
Ford 15
Ford 17
Toyota 24
Ford NA
Ford NA
,所以我想與福特的MPG如何用R中該列的某個類別的平均值替換缺失值?
的平均值來代替缺失值利用dplyr
library(dplyr)
df%>%group_by(Cars)%>%mutate(MPG=ifelse(is.na(MPG),mean(MPG,na.rm=T),MPG))
# A tibble: 8 x 2
# Groups: Cars [3]
Cars MPG
<chr> <dbl>
1 Ford 12.00000
2 Toyota 20.00000
3 Honda 18.00000
4 Ford 15.00000
5 Ford 17.00000
6 Toyota 24.00000
7 Ford 14.66667
8 Ford 14.66667
na.aggregate
將與平均的非NAS的替代來港,可用於ave
通過Cars
應用此:
library(zoo)
transform(DF, MPG = ave(MPG, Cars, FUN = na.aggregate))
,並提供:
Cars MPG
1 Ford 12.00000
2 Toyota 20.00000
3 Honda 18.00000
4 Ford 15.00000
5 Ford 17.00000
6 Toyota 24.00000
7 Ford 14.66667
8 Ford 14.66667
注:在重現的形式輸入DF
是:管理使用replace_na
,但失去了行順序
Lines <- "
Cars MPG
Ford 12
Toyota 20
Honda 18
Ford 15
Ford 17
Toyota 24
Ford NA
Ford NA"
DF <- read.table(text = Lines, header = TRUE)
2解決方案:
df %>% split(.$Cars) %>% map_df(~replace_na(.x,list(MPG=mean(.x$MPG,na.rm=T))))
df %>% by(.$Cars,function(x) replace_na(x,list(MPG=mean(x$MPG,na.rm=T)))) %>% unclass %>% bind_rows
# Cars MPG
# 1 Ford 12.00000
# 2 Ford 15.00000
# 3 Ford 17.00000
# 4 Ford 14.66667
# 5 Ford 14.66667
# 6 Honda 18.00000
# 7 Toyota 20.00000
# 8 Toyota 24.00000
在基礎R,你可以使用ave
來執行組級操作,如下所示。
兩條線的內功能:
ave(DF$MPG, DF$Cars, FUN=function(x) {x[is.na(x)] <- mean(x, na.rm=TRUE); x})
[1] 12.00000 20.00000 18.00000 15.00000 17.00000 24.00000 14.66667 14.66667
這裏,x[is.na(x)] <- mean(x, na.rm=TRUE)
替換非缺失值的平均值缺失值然後下一行返回完整的組矢量。
與replace
ave(DF$MPG, DF$Cars, FUN=function(x) replace(x, is.na(x), mean(x, na.rm=TRUE)))
[1] 12.00000 20.00000 18.00000 15.00000 17.00000 24.00000 14.66667 14.66667
當然,增加分配回data.frame,
DF$MPG <- ave(DF$MPG, DF$Cars, FUN=function(x) replace(x, is.na(x), mean(x, na.rm=TRUE)))