2014-07-08 159 views
-1

我有一個很大的數據集,類似於下面的可重現樣本數據。在R中彙總每週水平數據到每週水平

Interval value 
1 2012-06-10 552 
2 2012-06-11 4850 
3 2012-06-12 4642 
4 2012-06-13 4132 
5 2012-06-14 4190 
6 2012-06-15 4186 
7 2012-06-16 1139 
8 2012-06-17 490 
9 2012-06-18 5156 
10 2012-06-19 4430 
11 2012-06-20 4447 
12 2012-06-21 4256 
13 2012-06-22 3856 
14 2012-06-23 1163 
15 2012-06-24 564 
16 2012-06-25 4866 
17 2012-06-26 4421 
18 2012-06-27 4206 
19 2012-06-28 4272 
20 2012-06-29 3993 
21 2012-06-30 1211 
22 2012-07-01 698 
23 2012-07-02 5770 
24 2012-07-03 5103 
25 2012-07-04 775 
26 2012-07-05 5140 
27 2012-07-06 4868 
28 2012-07-07 1225 
29 2012-07-08 671 
30 2012-07-09 5726 
31 2012-07-10 5176 

我想這彙總數據每週水平得到類似以下的輸出:

Interval   value 
1 Week 2, June 2012 *aggregate value for day 10 to day 14 of June 2012* 
2 Week 3, June 2012 *aggregate value for day 15 to day 21 of June 2012* 
3 Week 4, June 2012 *aggregate value for day 22 to day 28 of June 2012* 
4 Week 5, June 2012 *aggregate value for day 29 to day 30 of June 2012* 
5 Week 1, July 2012 *aggregate value for day 1 to day 7 of July 2012* 
6 Week 2, July 2012 *aggregate value for day 8 to day 10 of July 2012* 

如何做到這一點很容易,而無需編寫長碼?

+0

您使用的[XTS]標籤,但它並不像你有一個XTS對象。你說得對,雖然xts可能是最簡單的方法。你有搜索嗎?看看'to.weekly','apply.weekly','period.apply'和搜索SO。 – GSee

回答

2

如果您從lubridate開始使用week,那麼您只需要五週的時間就可以轉到by。假設dat是您的數據,

> library(lubridate) 
> do.call(rbind, by(dat$value, week(dat$Interval), summary)) 
# Min. 1st Qu. Median Mean 3rd Qu. Max. 
# 24 552 4146 4188 3759 4529 4850 
# 25 490 2498 4256 3396 4438 5156 
# 26 564 2578 4206 3355 4346 4866 
# 27 698  993 4868 3366 5122 5770 
# 28 671 1086 3200 3200 5314 5726 

這說明經過一年的28周爲24日的總結。同樣,當你說「彙總」的值,我們可以得到aggregate手段與

> aggregate(value~week(Interval), data = dat, mean) 
# week(Interval) value 
# 1    24 3758.667 
# 2    25 3396.286 
# 3    26 3355.000 
# 4    27 3366.429 
# 5    28 3199.500 
0

,你的意思是把他們的總和?比方說,你的數據幀d並假設d$IntervalDate類的,你可以嘗試

# if d$Interval is not of class Date d$Interval <- as.Date(d$Interval) 
formatdate <- function(date) 
    paste0("Week ", as.numeric(format(date, "%d")) %/% 7 + 1, 
     ", ", format(date, "%b %Y")) 
# change "sum" to your required function 
aggregate(d$value, by = list(formatdate(d$Interval)), sum) 
#   Group.1  x 
# 1 Week 1, Jul 2012 3725.667 
# 2 Week 2, Jul 2012 3199.500 
# 3 Week 2, Jun 2012 3544.000 
# 4 Week 3, Jun 2012 3434.000 
# 5 Week 4, Jun 2012 3333.143 
# 6 Week 5, Jun 2012 3158.667 
10

如果按周意味着「價值」的總和,我認爲這樣做是爲了將數據轉換的最簡單方法成XTS對象GSEE建議:

data <- as.xts(data$value,order.by=as.Date(data$interval)) 
weekly <- apply.weekly(data,sum) 

      [,1] 
2012-06-10 552 
2012-06-17 23629 
2012-06-24 23872 
2012-07-01 23667 
2012-07-08 23552 
2012-07-10 10902 

我離開的輸出格式作爲練習你:-)

+0

如何切換到ts()對象以便使用預測和分解? – gmeroni

+0

使用「as」方法:'as.ts(data)' – hvollmeier

1

如果您使用的數據幀,可以方便地與tidyquant做到這一點包。使用tq_transmute函數,該函數應用一個變異並返回一個新的數據幀。選擇「值」列並應用xts功能apply.weekly。額外的參數FUN = sum將按周獲取聚合。


library(tidyquant) 

df 
#> # A tibble: 31 x 2 
#>  Interval value 
#>  <date> <int> 
#> 1 2012-06-10 552 
#> 2 2012-06-11 4850 
#> 3 2012-06-12 4642 
#> 4 2012-06-13 4132 
#> 5 2012-06-14 4190 
#> 6 2012-06-15 4186 
#> 7 2012-06-16 1139 
#> 8 2012-06-17 490 
#> 9 2012-06-18 5156 
#> 10 2012-06-19 4430 
#> # ... with 21 more rows 

df %>% 
    tq_transmute(select  = value, 
       mutate_fun = apply.weekly, 
       FUN  = sum) 
#> # A tibble: 6 x 2 
#>  Interval value 
#>  <date> <int> 
#> 1 2012-06-10 552 
#> 2 2012-06-17 23629 
#> 3 2012-06-24 23872 
#> 4 2012-07-01 23667 
#> 5 2012-07-08 23552 
#> 6 2012-07-10 10902