2014-03-29 63 views
0

我有這樣按月獲得平均值爲某些月份中的R

 logMosqm2 Date 
[1,] -4.0000000 10296 
[2,] -4.0000000 10313 
[3,] -4.0000000 10342 
[4,] -4.0000000 10388 
[5,] -0.9592633 10526 
[6,] -4.0000000 10572 

只有四個列(類似於第一)和多行的數據。

我想獲得的平均每月爲logMosm2(和其它類似的變量),但只有幾個月來都在歲月2004年,2005年和2006年

關於日期:這些來自

yearfish <- cbind(logMosqm2, Date) 

只是使用日期給

"1998-03-11" "1998-03-28" "1998-04-26" "1998-06-11" "1998-10-27" 

中表日以來1月1日的日期是1970年

編輯:我發現功能在ZOO包。所以我現在有一個變量yearmon:

[1] "Mar 1998" "Mar 1998" "Apr 1998" "Jun 1998" 
[5] "Oct 1998" "Dec 1998" "Apr 1999" "Nov 1999" 
[9] "Feb 2000" "Feb 2000" "Mar 2000" "Apr 2000" 
[13] "May 2000" "Jun 2000" "Oct 2000" "Dec 2000" 
[17] "Mar 2001" "Jun 2001" "Sep 2001" "Jan 2002" 
[21] "Jun 2002" "Dec 2002" "Apr 2003" "Jun 2003" 
[25] "Jan 2004" "Mar 2004" "Apr 2004" "May 2004" 
[29] "Jun 2004" "Jun 2004" "Jul 2004" "Jul 2004" 
[33] "Jul 2004" "Aug 2004" "Aug 2004" "Aug 2004" 
[37] "Aug 2004" "Aug 2004" "Aug 2004" "Sep 2004" 
[41] "Sep 2004" "Sep 2004" "Sep 2004" "Sep 2004" 
[45] "Sep 2004" "Sep 2004" "Sep 2004" "May 2005" 
[49] "May 2005" "May 2005" "May 2005" "May 2005" 
[53] "May 2005" "Jun 2005" "Jun 2005" "Jun 2005" 
[57] "Jul 2005" "Jul 2005" "Jul 2005" "Aug 2005" 
[61] "Aug 2005" "Aug 2005" "Sep 2005" "Sep 2005" 
[65] "Sep 2005" "May 2006" "May 2006" "May 2006" 
[69] "Jun 2006" "Jun 2006" "Jun 2006" "Jul 2006" 
[73] "Jul 2006" "Sep 2006" "Sep 2006" "Apr 2007" 
[77] "May 2007" "Jul 2007" "Sep 2007" "Jan 2008" 
[81] "Mar 2008" "May 2008" 
+0

那是什麼日期格式? –

+0

數據最初來自格式爲10/14/1993的日期.csv文件。我用read.csv將它們讀入'R'。這些(我認爲)沒有格式化的日期。 –

+0

我還是很困惑。所以上面的'10296'是'10/2/96'? –

回答

1

使用在我們創建了一個分組變量g最初由序列號的末尾定義的測試數據集camm.recent。對於2004年至2006年間的任何重複yearmon,我們用NA填寫重複項並使用na.locf替換序列號與該yearmon的先前首次出現的序列號。然後使用aggregate來計算平均值。

# this uses the test data set defined at the end 
library(zoo) 

# create an appropriate grouping variable, g 
ym <- camm.recent$yearmon 
between <- ym >= "Jan 2004" & ym <= "Dec 2006" 
g <- replace(seq_along(ym), duplicated(ym) & between, NA) 
g <- na.locf(g) 

# aggregate by g 
aggregate(camm.recent, data.frame(g), mean)[-1] # -1 removes g from output 

下面是創建測試數據集的代碼:

library(zoo) 
ym <- c("Mar 1998", "Mar 1998", "Apr 1998", "Jun 1998", 
"Oct 1998", "Dec 1998", "Apr 1999", "Nov 1999", 
"Feb 2000", "Feb 2000", "Mar 2000", "Apr 2000", 
"May 2000", "Jun 2000", "Oct 2000", "Dec 2000", 
"Mar 2001", "Jun 2001", "Sep 2001", "Jan 2002", 
"Jun 2002", "Dec 2002", "Apr 2003", "Jun 2003", 
"Jan 2004", "Mar 2004", "Apr 2004", "May 2004", 
"Jun 2004", "Jun 2004", "Jul 2004", "Jul 2004", 
"Jul 2004", "Aug 2004", "Aug 2004", "Aug 2004", 
"Aug 2004", "Aug 2004", "Aug 2004", "Sep 2004", 
"Sep 2004", "Sep 2004", "Sep 2004", "Sep 2004", 
"Sep 2004", "Sep 2004", "Sep 2004", "May 2005", 
"May 2005", "May 2005", "May 2005", "May 2005", 
"May 2005", "Jun 2005", "Jun 2005", "Jun 2005", 
"Jul 2005", "Jul 2005", "Jul 2005", "Aug 2005", 
"Aug 2005", "Aug 2005", "Sep 2005", "Sep 2005", 
"Sep 2005", "May 2006", "May 2006", "May 2006", 
"Jun 2006", "Jun 2006", "Jun 2006", "Jul 2006", 
"Jul 2006", "Sep 2006", "Sep 2006", "Apr 2007", 
"May 2007", "Jul 2007", "Sep 2007", "Jan 2008", 
"Mar 2008", "May 2008") 
camm.recent <- data.frame(logMosqm2 = seq_along(ym), 
    Date = as.Date(as.yearmon(ym)), 
    yearmon = as.yearmon(ym)) 
2

需要一個更好的例子。所有這些日期是在1998年

tapply(inp[ , 'logMosqm2'], format(as.Date(inp[ , 'Date'], origin=as.Date("1970-01-01")),format="%Y"),mean) 
    1998 
-3.493211 

現在看來,你希望得到什麼改變格式爲「$ Y-%M」傳遞,使2003-01比2004-01不同。 (這不是你寫的。)

但這裏是如何做到這一點,一旦我們有我們可以應用限制功能明智例如:

tapply(inp[ , 'logMosqm2'], format(as.Date(inp[ , 'Date'], origin=as.Date("1970-01-01")),format="%m"),mean) 
     03   04   06   10   12 
-4.0000000 -4.0000000 -4.0000000 -0.9592633 -4.0000000 

rd.txt是一個功能我的文字之前創建加入參數scan,所以我可以做的海報,比如你自己,誰沒有與dput瞭解到的發佈例子美德快速輸入:

rd.txt <-  function (txt, header = TRUE, ...) 
{  rd <- read.table(textConnection(txt), header = header, ...) 
    closeAllConnections() 
    rd} 

inp <- data.matrix(rd.txt(" logMosqm2 Date 
    -4.0000000 10296 
    -4.0000000 10313 
    -4.0000000 10342 
    -4.0000000 10388 
    -0.9592633 10526 
    -4.0000000 10572")) 
+0

什麼是inp?我怎麼能在數據框上做到這一點?謝謝 –

+0

嗯,我的'inp'是一個矩陣,因爲你展示了一個R矩陣,但是我認爲這個代碼應該和data.frame一起工作,因爲索引是按列名。 –

1

使用data.table用一個例子

#Example 
library(data.table) 
set.seed(123) 
all_dates <- as.numeric(seq(as.Date("2004/01/01"), as.Date("2008/12/31"), "day")) 
dates <- sample(all_dates, 1000, replace = T) 
dt <- data.table(x = rnorm(1000), dates = dates) 
dt 
#    x dates 
# 1: -0.6018928 12943 
# 2: -0.9936986 13858 
# 3: 1.0267851 13165 
# 4: 0.7510613 14031 
# 5: -1.5091665 14136 

dt[, dates := as.Date(dates, origin = "1970-01-01")] 
dt 
#    x  dates 
# 1: -0.6018928 2005-06-09 
# 2: -0.9936986 2007-12-11 
# 3: 1.0267851 2006-01-17 
# 4: 0.7510613 2008-06-01 
# 5: -1.5091665 2008-09-14 

#Relevant Code 
dt[, c("month", "year") := list(month(dates), year(dates))] 
dt[year %in% c(2004, 2005, 2006), mean(x), by = month] 
# month   V1 
# 1:  6 -0.044292743 
# 2:  1 0.078148206 
# 3:  3 0.062165254 
# 4:  8 -0.149267201 
# 5: 10 -0.024994773 
# 6:  4 0.159856357 
# 7: 11 -0.028046083 
# 8:  7 0.019404375 
# 9:  9 0.117634410 
#10: 12 0.074059451 
#11:  2 -0.001347801 
#12:  5 0.096914779 
1

生成與年和月列數據幀,則子集你想要的歲月,並使用tapply的因素,以獲得平均:

d = data.frame(x=1:9,y=c(2001,2001,2001,2002,2002,2002,2003,2003,2003), 
    month=rep(c('a','b','c'),3)) 
ds = subset(d,y>2001) 
tapply(ds$x,ds$month,mean) 

數據是這樣的:

x y z 
1 1 2001 a 
2 2 2001 b 
3 3 2001 c 
4 4 2002 a 
5 5 2002 b 
6 6 2002 c 
7 7 2003 a 
8 8 2003 b 
9 9 2003 c 

輸出是:

a b c 
5.5 6.5 7.5 
+0

不要在子集參數中使用$索引。子集的全部要點是在第一個參數的環境中消除列名。 –

+0

糟糕。我的錯。謝謝。 – beroe

0

感謝所有。我已經在你的幫助下解決了它。下面是我做的

library(zoo) 
camm.recent$yearmon <- as.yearmon(camm.recent$Date) 

camm.recent$avglogMosqm2[camm.recent$yearmon > "Dec 2003" & 
          camm.recent$yearmon < "Jan 2007"] <- 
    subset(tapply(camm.recent$logMosqm2, camm.recent$yearmon, 
        mean), camm.recent$yearmon > "Dec 2003" & 
        camm.recent$yearmon < "Jan 2007") 
camm.recent$avglogMosqm2[camm.recent$yearmon < "Jan 2004"] <- 
    camm.recent$logMosqm2[camm.recent$yearmon < "Jan 2004"] 

camm.recent$avglogMosqm2[camm.recent$yearmon > "Dec 2006"] <- 
    camm.recent$logMosqm2[camm.recent$yearmon > "Dec 2006"] 

camm.recent$newdate[camm.recent$yearmon > "Dec 2003" & 
          camm.recent$yearmon < "Jan 2007"] <- 
    subset(tapply(camm.recent$Date, camm.recent$yearmon, 
       mean), camm.recent$yearmon > "Dec 2003" & 
      camm.recent$yearmon < "Jan 2007") 
camm.recent$newdate[camm.recent$yearmon < "Jan 2004"] <- 
    camm.recent$Date[camm.recent$yearmon < "Jan 2004"] 

camm.recent$newdate[camm.recent$yearmon > "Dec 2006"] <- 
    camm.recent$Date[camm.recent$yearmon > "Dec 2006"] 
相關問題