2013-04-23 31 views
3

我有30秒的CPU數據,如下所示。我想要做的是將這些數據彙總成5分鐘和10分鐘的平均值。如何將30個粒度數據聚合爲5分鐘數據

dput(head(res,50)) 
structure(list(DATE = structure(c(1362114023, 1362114053, 1362114083, 
1362114113, 1362114143, 1362114150, 1362114173, 1362114180, 1362114203, 
1362114210, 1362114233, 1362114240, 1362114263, 1362114270, 1362114293, 
1362114300, 1362114330, 1362114360, 1362114390, 1362114420, 1362114450, 
1362114480, 1362114510, 1362114540, 1362114570, 1362114600, 1362114630, 
1362114660, 1362114690, 1362114720, 1362114750, 1362114780, 1362114810, 
1362114840, 1362114870, 1362114900, 1362114930, 1362114960, 1362114990, 
1362115020, 1362115050, 1362115080, 1362115111, 1362115141, 1362115171, 
1362115201, 1362115231, 1362115261, 1362115291, 1362115321), class = c("POSIXct", 
"POSIXt"), tzone = ""), CPU = c(30L, 29L, 28L, 29L, 27L, 10L, 
25L, 11L, 23L, 9L, 22L, 8L, 22L, 7L, 19L, 7L, 7L, 8L, 6L, 7L, 
6L, 7L, 8L, 8L, 7L, 6L, 8L, 8L, 9L, 8L, 9L, 10L, 9L, 8L, 8L, 
6L, 8L, 7L, 9L, 10L, 11L, 11L, 9L, 9L, 8L, 9L, 11L, 8L, 6L, 8L 
)), .Names = c("DATE", "CPU"), row.names = c(132611L, 132612L, 
132613L, 132614L, 132615L, 131428L, 132616L, 131429L, 132617L, 
131430L, 132618L, 131431L, 132619L, 131432L, 132620L, 131433L, 
131434L, 131435L, 131436L, 131437L, 131438L, 131439L, 131440L, 
131441L, 131442L, 131443L, 131444L, 131445L, 131446L, 131447L, 
131448L, 131449L, 131450L, 131451L, 131452L, 131453L, 131454L, 
131455L, 131456L, 131457L, 131458L, 131459L, 131460L, 131461L, 
131462L, 131463L, 131464L, 131465L, 131466L, 131467L), class = "data.frame") 

任何想法,我如何能夠聚合我grunular數據?

回答

6

Versions of this question have been asked and answered a bunch of times on stackoverflow.但它一直在問。這裏的答案有望滿足大多數人的需求:

首先,使用一個處理不規則時間序列的包。它使它更容易。我喜歡xts

library(xts) 

mydata <- structure(list(DATE = structure(c(1362114023, 1362114053, 1362114083, 
1362114113, 1362114143, 1362114150, 1362114173, 1362114180, 1362114203, 
1362114210, 1362114233, 1362114240, 1362114263, 1362114270, 1362114293, 
1362114300, 1362114330, 1362114360, 1362114390, 1362114420, 1362114450, 
1362114480, 1362114510, 1362114540, 1362114570, 1362114600, 1362114630, 
1362114660, 1362114690, 1362114720, 1362114750, 1362114780, 1362114810, 
1362114840, 1362114870, 1362114900, 1362114930, 1362114960, 1362114990, 
1362115020, 1362115050, 1362115080, 1362115111, 1362115141, 1362115171, 
1362115201, 1362115231, 1362115261, 1362115291, 1362115321), class = c("POSIXct", 
"POSIXt"), tzone = ""), CPU = c(30L, 29L, 28L, 29L, 27L, 10L, 
25L, 11L, 23L, 9L, 22L, 8L, 22L, 7L, 19L, 7L, 7L, 8L, 6L, 7L, 
6L, 7L, 8L, 8L, 7L, 6L, 8L, 8L, 9L, 8L, 9L, 10L, 9L, 8L, 8L, 
6L, 8L, 7L, 9L, 10L, 11L, 11L, 9L, 9L, 8L, 9L, 11L, 8L, 6L, 8L 
)), .Names = c("DATE", "CPU"), row.names = c(132611L, 132612L, 
132613L, 132614L, 132615L, 131428L, 132616L, 131429L, 132617L, 
131430L, 132618L, 131431L, 132619L, 131432L, 132620L, 131433L, 
131434L, 131435L, 131436L, 131437L, 131438L, 131439L, 131440L, 
131441L, 131442L, 131443L, 131444L, 131445L, 131446L, 131447L, 
131448L, 131449L, 131450L, 131451L, 131452L, 131453L, 131454L, 
131455L, 131456L, 131457L, 131458L, 131459L, 131460L, 131461L, 
131462L, 131463L, 131464L, 131465L, 131466L, 131467L), class = "data.frame") 

mydata.xts <- xts(mydata$CPU, order.by = mydata$DATE) 

然後,適應period.apply基礎設施,可以很容易地聚集到即時不同的窗口:

apply.periodly <- function (x, FUN, period, k=1, ...) 
{ 
    if (!require("xts")) { 
    stop("Need 'xts'") 
    } 
    ep <- endpoints(x, on=period, k=k) 
    period.apply(x, ep, FUN, ...) 
} 

現在,創建聚合。

mydata.10m <- apply.periodly(x = mydata.xts, FUN = mean, period = "minutes", k = 10) 
mydata.5m <- apply.periodly(x = mydata.xts, FUN = mean, period = "minutes", k = 5) 

請注意,輸出時間戳將反映在每個聚合窗口最後輸入時間戳。

mydata.10m 
        [,1] 
2013-03-01 00:09:30 14.80 
2013-03-01 00:19:31 8.55 
2013-03-01 00:22:01 8.40 

mydata.5m 
         [,1] 
2013-03-01 00:04:53 19.93333 
2013-03-01 00:09:30 7.10000 
2013-03-01 00:14:30 8.30000 
2013-03-01 00:19:31 8.80000 
2013-03-01 00:22:01 8.40000 

然而,你可以圓你的時間戳向上或向下:

align.time.down=function(x,n){index(x)=index(x)-n;align.time(x,n)} 

mydata.10m <- align.time(mydata.10m, 10*60) 
mydata.10m 
#      [,1] 
# 2013-03-01 00:10:00 14.80 
# 2013-03-01 00:20:00 8.55 
# 2013-03-01 00:30:00 8.40 

mydata.5m <- align.time.down(mydata.5m, 5*60) 
mydata.5m 
#       [,1] 
# 2013-03-01 00:00:00 19.93333 
# 2013-03-01 00:05:00 7.10000 
# 2013-03-01 00:10:00 8.30000 
# 2013-03-01 00:15:00 8.80000 
# 2013-03-01 00:20:00 8.40000 
+0

這很棒。一個簡單的問題。如何將mydata.10m再次轉換回數據框?我gried data.frame(mydata.10m),無法正常工作。 – user1471980 2013-04-23 19:08:33

+0

'zoo'包中的'fortify.zoo'(用'xts'加載)將會爲你做。 – Noah 2013-04-23 19:58:52

0

你要彙總,你怎麼想它報道了設置的時間? 例如,你想聚合00:00 - 04:59還是00:01 - 05:00,並在期間或期末開始彙報?

自X聚集:00到x + 4點59和報告在週期的開始,使用floor創建時間戳向下舍入到最接近5分鐘:

data <- structure(...) 
data$DATE.5mindown <- as.POSIXct(floor(as.numeric(data$DATE)/(5 * 60)) * 
    (5 * 60), origin='1970-01-01') 
aggregate(CPU ~ DATE.5mindown, data, mean) 
#   DATE.5mindown  CPU 
# 1 2013-03-01 00:00:00 19.93333 
# 2 2013-03-01 00:05:00 7.10000 
# 3 2013-03-01 00:10:00 8.30000 
# 4 2013-03-01 00:15:00 8.80000 
# 5 2013-03-01 00:20:00 8.40000 

自X聚集:01至x + 5:00並在期末報告,使用ceiling創建四捨五入到最近的5分鐘的時間戳:

data$DATE.5minup <- as.POSIXct(ceiling(as.numeric(data$DATE)/(5 * 60)) * 
    (5 * 60), origin='1970-01-01') 
aggregate(CPU ~ DATE.5minup, data, mean) 
#   DATE.5minup  CPU 
# 1 2013-03-01 00:05:00 19.125000 
# 2 2013-03-01 00:10:00 7.000000 
# 3 2013-03-01 00:15:00 8.300000 
# 4 2013-03-01 00:20:00 9.111111 
# 5 2013-03-01 00:25:00 8.400000