我想計算兩個日期之間的變量的均值,下面是可重現的數據幀。如何計算兩個日期之間的變量的均值
year <- c(1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,
1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,
1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,
1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,1997)
month <- c("JAN","FEB","MAR","APR","MAY","JUN","JUL","AUG","SEP","OCT","NOV","DEC")
station <- c("A","A","A","A","A","A","A","A","A","A","A","A",
"B","B","B","B","B","B","B","B","B","B","B","B")
concentration <- as.numeric(round(runif(48,20,40),1))
df <- data.frame(year,month,station,concentration)
id <- c(1,2,3,4)
station1996 <- c("A","A","B","B")
station1997 <- c("B","A","A","B")
start <- c("06/01/1996","07/01/1996","07/01/1996","08/01/1996")
end <- c("04/01/1997","04/01/1997","04/01/1997","05/01/1997")
participant <- data.frame(id,station1996,station1997,start,end)
participant$start <- as.Date(participant$start, format = "%m/%d/%Y")
participant$end <- as.Date(participant$end, format = "%m/%d/%Y")
所以我有兩個數據集,如下
df
year month station concentration
1 1996 JAN A 24.4
2 1996 FEB A 37.0
3 1996 MAR A 39.5
4 1996 APR A 28.0
...
45 1997 SEP B 37.7
46 1997 OCT B 35.2
47 1997 NOV B 26.8
48 1997 DEC B 40.0
participant
id station1996 station1997 start end
1 1 A B 1996-06-01 1997-04-01
2 2 A A 1996-07-01 1997-04-01
3 3 B A 1996-07-01 1997-04-01
4 4 B B 1996-08-01 1997-05-01
每個ID,我想計算開始和結束日期(月日)的平均濃度。注意到電臺可能會在幾年之間發生變化。
例如對於id = 1,我想計算1996年6月到1997年4月的平均濃度。這應該基於1996年6月至1996年12月在A站的濃度以及1997年1月至1997年4月的濃度臺B.
任何人都可以幫忙嗎?
非常感謝。
第1步:將'start'和'end'轉換爲'Date'或'POSIXct'格式,並將'year'和'month'作爲同一格式的新列。 – MichaelChirico
您也可以將它們轉換爲「1997-10」形式的字符串。那麼你可以像'平均值(濃度[日期> =開始和日期<=結束])'庫(動物園)' –
; as.yearmon(參與者$ start)'等等......在這種情況下也可能非常方便,如果你不想處理稍微笨拙的POSIXct格式。 – thelatemail