我正在嘗試解決以下問題:爲Pop_Size_Group的每個值查找每個數值列的平均值。我需要找出排除任何非數字變量的有效方法。如何排除dplyr語句中的非數字列
這是我到目前爲止有:
library(dplyr)
df <- tbl_df(Demographics)
df %>%
group_by(Pop_Size_Group) %>%
summarise_each(funs(mean(., na.rm = TRUE)))
的代碼產生這樣的:
> df <- tbl_df(Demographics)
> df %>%
+ group_by(Pop_Size_Group) %>%
+ summarise_each(funs(mean(., na.rm = TRUE)))
# A tibble: 3 × 18
Pop_Size_Group County_name State Region_num Location Square_miles Population Pct_Age18_to_34 Pct_65_or_over
<chr> <lgl> <lgl> <dbl> <lgl> <dbl> <dbl> <dbl> <dbl>
1 Large NA NA 2.492958 NA 1239.3099 847193.0 28.96338 12.06197
2 Medium NA NA 2.465409 NA 861.3711 224348.6 28.30252 12.31572
3 Small NA NA 2.424460 NA 1045.1871 121956.6 28.46906 12.11295
# ... with 9 more variables: Num_physicians <dbl>, Num_hospital_beds <dbl>, Num_serious_crimes <dbl>,
# Pct_High_Sch_grads <dbl>, Pct_Bachelors <dbl>, Pct_below_poverty <dbl>, Pct_unemployed <dbl>,
# Per_cap_income <dbl>, Total_personal_income <dbl>
Warning messages:
1: In mean.default(c("Los_Angeles", "Cook", "Harris", "San_Diego", :
argument is not numeric or logical: returning NA
2: In mean.default(c("Pulaski", "Guilford", "Solano", "York", "Berks", :
argument is not numeric or logical: returning NA
3: In mean.default(c("Bibb", "Onslow", "Jackson", "Schenectady", "Rock_Island", :
argument is not numeric or logical: returning NA
4: In mean.default(c("CA", "IL", "TX", "CA", "CA", "NY", "AZ", "MI", :
argument is not numeric or logical: returning NA
5: In mean.default(c("AR", "NC", "CA", "PA", "PA", "NH", "TN", "FL", :
argument is not numeric or logical: returning NA
6: In mean.default(c("GA", "NC", "MI", "NY", "IL", "OH", "CA", "ME", :
argument is not numeric or logical: returning NA
7: In mean.default(c("West", "East", "West", "West", "West", "East", :
argument is not numeric or logical: returning NA
8: In mean.default(c("West", "East", "West", "East", "East", "East", :
argument is not numeric or logical: returning NA
9: In mean.default(c("East", "East", "East", "East", "East", "East", :
argument is not numeric or logical: returning NA
下面是一瞥輸出(DF)供參考:
> glimpse(df)
Observations: 440
Variables: 18
$ County_name <chr> "Los_Angeles", "Cook", "Harris", "San_Diego", "Orange", "Kings", "Maricopa", "W...
$ State <chr> "CA", "IL", "TX", "CA", "CA", "NY", "AZ", "MI", "FL", "TX", "PA", "WA", "CA", "...
$ Region_num <int> 4, 2, 3, 4, 4, 1, 4, 2, 3, 3, 1, 4, 4, 4, 2, 1, 1, 1, 1, 4, 3, 3, 4, 3, 2, 4, 2...
$ Location <chr> "West", "East", "West", "West", "West", "East", "West", "East", "East", "West",...
$ Square_miles <int> 4060, 946, 1729, 4205, 790, 71, 9204, 614, 1945, 880, 135, 2126, 1291, 20062, 4...
$ Population <int> 8863164, 5105067, 2818199, 2498016, 2410556, 2300664, 2122101, 2111687, 1937094...
$ Pop_Size_Group <chr> "Large", "Large", "Large", "Large", "Large", "Large", "Large", "Large", "Large"...
$ Pct_Age18_to_34 <dbl> 32.1, 29.2, 31.3, 33.5, 32.6, 28.3, 29.2, 27.4, 27.1, 32.6, 29.1, 30.1, 32.6, 3...
$ Pct_65_or_over <dbl> 9.7, 12.4, 7.1, 10.9, 9.2, 12.4, 12.5, 12.5, 13.9, 8.2, 15.2, 11.1, 8.7, 8.8, 1...
$ Num_physicians <int> 23677, 15153, 7553, 5905, 6062, 4861, 4320, 3823, 6274, 4718, 6641, 5280, 4101,...
$ Num_hospital_beds <int> 27700, 21550, 12449, 6179, 6369, 8942, 6104, 9490, 8840, 6934, 10494, 4009, 334...
$ Num_serious_crimes <int> 688936, 436936, 253526, 173821, 144524, 680966, 177593, 193978, 244725, 214258,...
$ Pct_High_Sch_grads <dbl> 70.0, 73.4, 74.9, 81.9, 81.2, 63.7, 81.5, 70.0, 65.0, 77.1, 64.3, 88.2, 82.0, 7...
$ Pct_Bachelors <dbl> 22.3, 22.8, 25.4, 25.3, 27.8, 16.6, 22.1, 13.7, 18.8, 26.3, 15.2, 32.8, 32.6, 1...
$ Pct_below_poverty <dbl> 11.6, 11.1, 12.5, 8.1, 5.2, 19.5, 8.8, 16.9, 14.2, 10.4, 16.1, 5.0, 5.0, 10.3, ...
$ Pct_unemployed <dbl> 8.0, 7.2, 5.7, 6.1, 4.8, 9.5, 4.9, 10.0, 8.7, 6.1, 8.0, 4.6, 5.5, 8.0, 5.5, 7.3...
$ Per_cap_income <int> 20786, 21729, 19517, 19588, 24400, 16803, 18042, 17461, 17823, 21001, 16721, 23...
$ Total_personal_income <int> 184230, 110928, 55003, 48931, 58818, 38658, 38287, 36872, 34525, 38911, 26512, ...
這裏是一個數據鏈接: example data
加上'na.omit()'你dplyr管 –
我無法重現,因爲沒有 '人口' 數據集,但一般情況下,你可以使用'summarise_if(is.numeric,平均(。 ,na.rm = TRUE))' – Mislav
已更新評論,並鏈接到示例數據。爲最初不這樣做而道歉。 – Tommy