2012-10-18 23 views
3

我得到爲什麼矢量化函數比for循環更好。聚合時間序列時避免出現循環

但是有些問題我看不到矢量化的函數式編程解決方案。其中之一是彙總每月數據以獲得季度數據。任何建議,以取代此代碼...

month <- 1:100 
A422072L <- c(rep(NA, 4), rnorm(96, 100, 5)) + 2 * month 
A422070J <- c(NA, NA, rnorm(96, 100, 5), NA, NA) + 2 * month 
Au.approvals <- data.frame(month=month, A422072L=A422072L, A422070J=A422070J) 

Au.approvals$trend.sum.A422072L.qtr <- NA 
Au.approvals$sa.sum.A422070J.qtr <- NA 
for(i in seq_len(nrow(Au.approvals))) 
{ 
    if(i < 3) next 
    if(all(!is.na(Au.approvals$A422072L[(i-2):i]))) 
     Au.approvals$trend.sum.A422072L.qtr[i] <- sum(Au.approvals$A422072L[(i-2):i]) 
    if(all(!is.na(Au.approvals$A422070J[(i-2):i]))) 
     Au.approvals$sa.sum.A422070J.qtr[i] <- sum(Au.approvals$A422070J[(i-2):i]) 
} 

print(Au.approvals) 

現在有足夠的數據來運行作爲示例。

+0

請提供一個可重複的示例。你可能會想看看'ddply',''aggregate','ave'等。 –

回答

4

讓我們創造了一些虛假的時間序列:

time_dat = data.frame(t = 1:100, value = runif(100)) 

爲了得到滾動總和,請大家從動物園包一看rollapply

require(zoo) 
time_dat = transform(time_dat, 
        roll_value = rollapply(value, 10, sum, fill = TRUE)) 

這裏我假設較粗的分辨率(每季度一次)比精細分辨率粗10倍。


用於非滾動平均值原來的答覆:

我喜歡從plyr包中使用該功能,但aveaggregatedata.table也是不錯的選擇。對於大型數據集,data.table的速度非常快。但要回一些plyr法寶:

首先創建一個額外的列指定更粗的時間頻率,即該季度是您的觀察:

time_dat[["coarse_t"]] = rep(1:10, each = 10) 
> head(time_dat) 
    t  value coarse_t 
1 1 0.9045097  1 
2 2 0.4174182  1 
3 3 0.5638139  1 
4 4 0.8228698  1 
5 5 0.7059027  1 
6 6 0.5285386  1 

現在我們可以聚集time_dat爲較粗的時間頻率:

time_dat_coarse = ddply(time_dat, .(coarse_t), summarise, sum_value = sum(value)) 
> time_dat_coarse 
    coarse_t sum_value 
1   1 6.097348 
2   2 4.834720 
3   3 3.988809 
4   4 4.170656 
5   5 4.538269 
6   6 6.198716 
7   7 4.399282 
8   8 5.507384 
9   9 6.089072 
10  10 4.663287 

+0

謝謝 - 但我真的很想每月滾動的季度總和 –

+0

@MarktheGraph看看'rollapply',看看我的答案的延伸。 –

+0

感謝您的幫助,發現以下內容: Au.approvals < - transform(Au.approvals,trend = rollapply(A422072L,3,sum,fill = NA,align =「right」)) Au.approvals < - transform Au.approvals,sa = rollapply(A422070J,3,sum,fill = NA,align =「right」)) –

1

保羅的回答是偉大的,但我只是想補充的是,代上包有很多優秀的操作S代表日期/時間分類可與plyr配對的聚集

library("chron") 
# chron uses chron-specific object representation. 
# If a different representation is needed, a conversion is necessary 
# eg. if a$date is a chron date object, I would us as.POSIXct(a$date) to get a POSIXct representation 

# create chron date objects and values 
a<-data.frame(date=as.chron(Sys.Date() + 1:1000), value = 1:100*runif(100,0,1)) 

# cuts dates into 15 intervals 
a$interval1<-cut(a$date,15) 
# cuts dates into 10 number of intervals using a label you define 
a$interval2<-cut(a$date,10,paste("group",1:10)) 
# cuts dates into weeks 
a$weeks<-cut(a$date,"weeks",start.on.monday=FALSE) 
# cuts dates into months 
a$months<-cut(a$date,"months") 
# cuts dates into years 
a$years<-cut(a$date,"years") 
# classifies day based on day of week 
a$day_of_week<-day.of.week(a$date) 

# creating a chron time object 
b<-data.frame(day_time=as.chron(Sys.time()+1:1000*100), value = 1:100*runif(100,0,1)) 
# cuts times into days - note: uses first time period as the start 
b$day<-cut(b$day_time,"days") 
# truncates time to 5 minute interval 
b$min_5<-trunc(b$day_time, "00:05:00") 
# truncates time to 1 hour intervals 
b$hour1<-trunc(b$day_time, "01:00:00") 
# truncates datetime to 1 hour and 2 second intervals 
b$days_3<-trunc(b$day_time, "01:00:02") 

我用代上很多,因爲它讓時間累積容易得多。

爲了獲得額外的樂趣,動物園和xts軟件包提供了許多功能,這些功能非常適合過去一天級細節的各種聚合。他們的文檔很龐大,可能很難找到你想要的東西,但幾乎所有你想要的東西都在那裏。一些亮點:

library("zoo") 
library("xts") 
?rollapply 
?rollsum 
?rollmean 
?rollmedian 
?rollmax 
?yearmon 
?yearqtr 
?apply.daily 
?apply.weekly 
?apply.monthly 
?apply.quarterly 
?apply.yearly 
?to.minutes 
?to.minutes3 
?to.minutes5 
?to.minutes10 
?to.minutes15 
?to.minutes30 
?to.hourly 
?to.daily 
?to.weekly 
?to.monthly 
?to.quarterly 
?to.yearly 
?to.period