2017-07-03 66 views
-2

我有一個表(類)是這樣的:R - 如何獲得字符串字段的中位數和平均值?

County Age.Group 
Albany  0-5 
Albany  10-15 
Albany  10-15 
new York 5-10 
new York 5-10 
new York 0-5 
LI   0-5 
LI   0-5 
LI   0-5 

我需要得到平均數和中位數的縣,所以我知道我需要計算奧爾巴尼,紐約,和李的多少次出現在列表中然後使用均值和中值函數。我不知道如何做到這一點,因爲當我使用平均值或中位數作爲該字段不是整數時,我會收到錯誤消息。請幫忙.....

+3

你是什麼預計奧爾巴尼的產量是多少? – brittenb

回答

2

如果不是以下內容,不確定您的確切預期輸出是什麼,它應該可以幫助您稍微前進。

> # Load data 
> df <- data.frame(County = c("Albany", "Albany", "Albany", "new York", 
+        "new York", "new York", "LI", "LI", "LI"), 
+     Age.Group = c("0-5", "10-15", "10-15", "5-10", "5-10", "0-5", 
+        "0-5", "0-5", "0-5"), stringsAsFactors = FALSE) 
> 
> # Split the age by "-", resulting in a list 
> age_split <- strsplit(df[, 2], split = "-", fixed = TRUE) 
> 
> # Turn numeric and take middle point of group, sapply turns back into vector 
> df$Age.Group.Mean <- sapply(age_split, function(x) mean(as.numeric(x))) 
> 
> # Print df 
> df 
    County Age.Group Age.Group.Mean 
1 Albany  0-5   2.5 
2 Albany  10-15   12.5 
3 Albany  10-15   12.5 
4 new York  5-10   7.5 
5 new York  5-10   7.5 
6 new York  0-5   2.5 
7  LI  0-5   2.5 
8  LI  0-5   2.5 
9  LI  0-5   2.5 
> 
> # Calculate what is needed 
> aggregate(Age.Group.Mean ~ County, data = df, median) 
    County Age.Group.Mean 
1 Albany   12.5 
2  LI   2.5 
3 new York   7.5 
> aggregate(Age.Group.Mean ~ County, data = df, mean) 
    County Age.Group.Mean 
1 Albany  9.166667 
2  LI  2.500000 
3 new York  5.833333 
0

我會假設你的期望輸出的是@ snoram的解釋是正確─如果是這樣,這裏是一個dplyr選擇:

df2 <- df %>% 
    group_by(County) %>% 
    summarise(
     mean.age = mean(as.numeric(unlist(strsplit(Age.Group, "-")))), 
     median.age = median(as.numeric(unlist(strsplit(Age.Group, "-")))) 
    ) 

> data.frame(df2) 
    County mean.age median.age 
1 Albany 9.166667  10.0 
2  LI 2.500000  2.5 
3 new York 5.833333  5.0 

數據:

df <- structure(list(County = c("Albany", "Albany", "Albany", "new York", 
           "new York", "new York", "LI", "LI", "LI"), Age.Group = c("0-5", 
                         "10-15", "10-15", "5-10", "5-10", "0-5", "0-5", "0-5", "0-5")), .Names = c("County", 
                                            "Age.Group"), row.names = c(NA, -9L), class = "data.frame") 
相關問題