2014-10-12 39 views
0

我運行下面的代碼在R:通過或分割文件處理後的R中的錯誤消息?

library("AER") 
data(CPS1985,package="AER") 
by(CPS1985[c("wage","age","experience")],CPS1985["gender"],mean,na.rm=TRUE) 

但每當我做我總是得到一個錯誤信息如下:

by(CPS1985[c("wage","age","experience")],CPS1985["gender"],mean,na.rm=TRUE) 
gender: male 
[1] NA 
gender: female 
[1] NA 
Warning messages: 
1: In mean.default(data[x, , drop = FALSE], ...) : 
    argument is not numeric or logical: returning NA 
2: In mean.default(data[x, , drop = FALSE], ...) : 
    argument is not numeric or logical: returning NA 

我已經運行的工資代碼,歲之前還檢查和經驗都是數字,性別是一個因素變量。所以我有點困惑,爲什麼我得到這個錯誤信息?

謝謝。

回答

0

您需要使用colMeansby當有不止一個column

by(CPS1985[, c("wage", "age", "experience")], CPS1985["gender"], 
              FUN=colMeans, na.rm=TRUE) 
#gender: male 
#  wage  age experience 
# 9.994913 35.979239 16.965398 
# ------------------------------------------------------------ 
#gender: female 
#  wage  age experience 
# 7.878857 37.840816 18.832653 

或者你可以使用summarise_eachdplyr

library(dplyr) 
CPS1985 %>% 
     group_by(gender) %>% 
     summarise_each(funs(mean=mean(., na.rm=TRUE)), wage, age, experience) 
# Source: local data frame [2 x 4] 

# gender  wage  age experience 
# 1 male 9.994913 35.97924 16.96540 
# 2 female 7.878857 37.84082 18.83265 
1

data.table解決方案。

library(data.table) 
setDT(CPS1985) ## convert data to data table 
CPS1985[, lapply(.SD, mean(na.rm=TRUE)), by=gender, .SDcols=c("wage","age","experience")] 
    gender  wage  age experience 
1: female 7.878857 37.84082 18.83265 
2: male 9.994913 35.97924 16.96540