2013-12-20 103 views
2

我有一個大數據幀是這樣的:計算平均通過組通過避免第一值的組中中的R

groupvar <- c("A", "A", "A", "A", "B", "B", "B", "C", "C", "C", "C", "D", "D", "D", "E", "E") 
valuevar <- c(1, 0.5, 0.5, 0.5, 1, 0.75, 0.75, 1, 0.8, 0.8, 0.8, 1, 0.9, 0.9, 1, 1.5) 
myd <- data.frame (groupvar, valuevar) 

    groupvar valuevar 
1   A  1.00 
2   A  0.50 
3   A  0.50 
4   A  0.50 
5   B  1.00 
6   B  0.75 
7   B  0.75 
8   C  1.00 
9   C  0.80 
10  C  0.80 
11  C  0.80 
12  D  1.00 
13  D  0.90 
14  D  0.90 
15  E  1.00 
16  E  1.50 

我想計算手段,但希望避免在每個組變量在第一元件的第一值。例如1是給每組中第一個值的值。例如,對於 「A」 組平均將基於0.5,0.5,0.5,避免第一個值1

這是我的想法是:

meanfun <- function(x)sum(x)-x[1]/ length(x) 
ddply (myd,"groupvar",meanfun) 

Error in FUN(X[[1L]], ...) : 
    only defined on a data frame with all numeric variables 

回答

5

這可以幫助

> with(myd, tapply(valuevar, groupvar, function(x) mean(x[-1]))) 
    A B C D E 
0.50 0.75 0.80 0.90 1.50 

使用aggregate

> aggregate(valuevar ~ groupvar, FUN=function(x) mean(x[-1]), data=myd) 
    groupvar valuevar 
1  A  0.50 
2  B  0.75 
3  C  0.80 
4  D  0.90 
5  E  1.50 

使用ddply

> library(plyr) 
> ddply (myd, "groupvar", summarize, MeanVar=mean(valuevar[-1])) 
    groupvar MeanVar 
1  A 0.50 
2  B 0.75 
3  C 0.80 
4  D 0.90 
5  E 1.50 
0

我會做的是創建一個新的數據框,消除組var的第一個元素。然後我會採取手段通過組var。

myd_rmFstElement <- myd[which(duplicated(myd$groupvar)), ] 
myd_means <- aggregate(valuevar ~ groupvar, FUN=mean, myd_rmFstElement) 
1

您可以將數據拆分爲groupvar並應用均值函數。

groupvar <- c("A", "A", "A", "A", "B", "B", "B", "C", "C", "C", "C", "D", "D", "D", "E", "E") 
valuevar <- c(1, 0.5, 0.5, 0.5, 1, 0.75, 0.75, 1, 0.8, 0.8, 0.8, 1, 0.9, 0.9, 1, 1.5) 
myd <- data.frame (groupvar, valuevar) 

lapply(split(myd, f=myd[, "groupvar"]), function(x) mean(x[-1,2]))