2014-01-28 250 views
8

我想計算組內連續值之間的比率。使用差異很容易:diff計算R中連續值的分組比例

mdata <- data.frame(group = c("A","A","A","B","B","C","C"), x = c(2,3,5,6,3,7,6)) 
mdata$diff <- unlist(by(mdata$x, mdata$group, function(x){c(NA, diff(x))})) 
mdata 

    group x diff 
1  A 2 NA 
2  A 3 1 
3  A 5 2 
4  B 6 NA 
5  B 3 -3 
6  C 7 NA 
7  C 6 -1 

是否有等效函數來計算比率?所需的輸出將是:

group x  ratio 
1  A 2  NA 
2  A 3 1.5000000 
3  A 5 1.6666667 
4  B 6  NA 
5  B 3 0.5000000 
6  C 7  NA 
7  C 6 0.8571429 

回答

7

嘗試dplyr:

install.packages(dplyr) 
require(dplyr) 
mdata <- data.frame(group = c("A","A","A","B","B","C","C"), x = c(2,3,5,6,3,7,6)) 
mdata <- group_by(mdata, group) 
mutate(mdata, ratio = x/lag(x)) 

# Source: local data frame [7 x 3] 
# Groups: group 

# group x  ratio 
# 1  A 2  NA 
# 2  A 3 1.5000000 
# 3  A 5 1.6666667 
# 4  B 6  NA 
# 5  B 3 0.5000000 
# 6  C 7  NA 
# 7  C 6 0.8571429 

你的差異將簡化爲:

mutate(mdata, diff = x - lag(x)) 

# Source: local data frame [7 x 3] 
# Groups: group 

# group x diff 
# 1  A 2 NA 
# 2  A 3 1 
# 3  A 5 2 
# 4  B 6 NA 
# 5  B 3 -3 
# 6  C 7 NA 
# 7  C 6 -1 
+3

完美使用案例'滯後()':) – hadley

1

使用by

do.call(rbind, by(mdata, mdata$group, function(dat) { 
    dat$ratio <- dat$x/c(NA, head(dat$x, -1)) 
    dat 
    })) 

#  group x  ratio 
# A.1  A 2  NA 
# A.2  A 3 1.5000000 
# A.3  A 5 1.6666667 
# B.4  B 6  NA 
# B.5  B 3 0.5000000 
# C.6  C 7  NA 
# C.7  C 6 0.8571429 
2

與另一種選擇:

transform(mdata, 
      ratio=ave(x, group, FUN=function(y) c(NA, tail(y, -1)/head(y, -1)))) 
3

同樣的想法,使用data.table

library(data.table) 
dt = as.data.table(mdata) 

dt[, ratio := x/lag(x), by = group] 
dt 
# group x  ratio 
#1:  A 2  NA 
#2:  A 3 1.5000000 
#3:  A 5 1.6666667 
#4:  B 6  NA 
#5:  B 3 0.5000000 
#6:  C 7  NA 
#7:  C 6 0.8571429