5
我想使用R dplyr包,以計算下面的時間間隔相關的問題,而無需使用循環:dplyr和間隔:計數觀察和總和的數據,而不使用循環
- 我想計數在每個觀測間隔(絕對和相對間隔端點)
- 我想在每個間隔中總結的觀測數據(絕對和相對間隔端點)
的間隔端點是從柱df_abs $間隔和df_rel $間隔。例如
- 間隔:(-INF,-60]
- 間隔:(-60,-30]
- 間隔:(-30,0]
的數據與所述數據幀和時間間隔是這樣的:
library(dplyr)
# ----------{ data and interval ----------
df_data <- data.frame(varA = NA,
varB = NA,
varC = c(-81.0, -14.3, 29.6, 42.7, 46.4, 57.7, 15.3, 256.3, 20.3, -25.1, -23.1, -17.5))
df_abs <- data.frame(interval = c(-Inf, -60, -30, 0, 30, 60, 100, 200, Inf),
count = NA,
sum = NA)
df_rel <- data.frame(interval = c(0,5,15,50,75,95,100),
count = NA,
sum = NA)
# ---------- data and interval }----------
# ----------{ calculation ----------
# absolute data frame
for (i in 1 : nrow(df_abs)-1) {
# count observation between interval
df_abs$count[i+1] <- summarise(df_data, sum(df_abs$interval[i] < varC & varC <= df_abs$interval[i+1]))
# sum between interval
df_abs$sum[i+1] <- sum(df_data$varC[df_abs$interval[i] < df_data$varC & df_data$varC <= df_abs$interval[i+1]])
}
# relative data frame
df_data_arranged <- df_data %>%
arrange(varC) %>%
mutate(observationPercent = c(1:nrow(df_data)) * 100/length(df_data$varC))
for (i in 1 : nrow(df_rel)-1) {
# count observation between interval
df_rel$count[i+1] <- summarise(df_data_arranged, sum(df_rel$interval[i] < observationPercent & observationPercent <= df_rel$interval[i+1]))
# sum between interval
df_rel$sum[i+1] <- sum(df_data_arranged$varC[df_rel$interval[i] < df_data_arranged$observationPercent & df_data_arranged$observationPercent <= df_rel$interval[i+1]])
}
# ---------- calculation }----------
答案應該是這樣的:
df_abs <- data.frame(interval = c(-Inf, -60, -30, 0, 30, 60, 100, 200, Inf),
count = c(0,1,0,4,3,3,0,0,1),
sum = c(0,-81,0,-80,65.2,146.8,0,0,256.3))
df_rel <- data.frame(interval = c(0,5,15,50,75,95,100),
count = c(0,0,1,4,3,2,1),
sum = c(0,0,-81,-39.6,92.6,104.1,256.3))
就我所瞭解的dplyr軟件包而言,對於這兩個問題中的每一個都應該有一個相當簡短和直接的解決方案,而不必使用循環。