2013-07-25 104 views
0

我有很多測量值,每分鐘記錄一次。某些值的平均值是給定分鐘的最小值和最大值。我想總結/聚合整個data.frame有每30分鐘一個條目,所以按時間序列和不同功能彙總數據幀

str(wgData) 
'data.frame': 115200 obs. of 7 variables: 
$ TIMESTAMP   : POSIXct, format: "2012-11-24 00:00:00" "2012-11-24 00:01:00" "2012-11-24 00:02:00" 7"2012-11-24 00:03:00" ... 
$ RECORD    : int 11683 11684 11685 11686 11687 11688 11689 11690 11691 11692 ... 
$ TPanel    : num -0.075 -0.075 -0.075 -0.095 -0.095 -0.095 -0.095 -0.118 -0.118 -0.118 ... 
$ VBattery   : num 13.8 13.8 13.8 13.8 13.8 ... 
$ VBatteryHeating_Avg: num 12.2 12.2 12.2 12.2 12.2 ... 
$ VBatteryHeating_Min: num 12.2 12.2 12.2 12.2 12.2 ... 
$ VBatteryHeating_Max: num 12.2 12.2 12.2 12.2 12.2 ... 

所以我想計算每30分鐘:TIMESTAMPTPanel(面板Temperatur的意思),平均VBattery,平均VBatteryHeating_AvgVBatteryHeating_Min最小,最大的VBatteryHeating_Max

我取得了一些成功通過做

wgData30min <- aggregate(list(TP = wgData$TPanel, VB=wgData$VBatteryHeating_Avg, VB_MIN=wgData$VBatteryHeating_Min, VB_MAX=wgData$VBatteryHeating_Min), 
       list(Timestamp = cut(wgData$TIMESTAMP, "30 min")), 
       mean) 
head(wgData30min) 
      Timestamp   TP  VB VB_MIN VB_MAX 
1 2012-11-24 00:00:00 -0.1621667 12.15467 12.15333 12.15333 
2 2012-11-24 00:30:00 -0.4751667 12.13333 12.13133 12.13133 
3 2012-11-24 01:00:00 -0.5647333 12.11167 12.11067 12.11067 
4 2012-11-24 01:30:00 -0.4573667 12.09133 12.08967 12.08967 
5 2012-11-24 02:00:00 -0.4923667 12.07100 12.07000 12.07000 
6 2012-11-24 02:30:00 -0.6469000 12.04933 12.04733 12.04733 

...但沒能傳遞要應用於列的函數數組。任何幫助表示讚賞。

+0

你能否包含一個可重複的例子?讓人們更容易回答你的問題。 – geotheory

+1

不幸的是,這對於'aggregate'來說是不可能的,即這個函數不接受不同的函數來應用於不同的列。 –

回答

3

我相信你的數據看起來像這樣

seconds <- seq(0,100000, by= 600) 
dates <- as.POSIXlt(seconds, origin = "2012-11-24", tz = "UTC") 
TPanel <- rnorm(167) 
VBatteryHeating_Avg <- rcauchy(167) 
VBatteryHeating_Min <- runif(167) 
VBatteryHeating_Max <- rexp(167) 

wgData <- data.frame(TIMESTAMP = dates, 
        TPanel = TPanel, 
        VBatteryHeating_Avg = VBatteryHeating_Avg, 
        VBatteryHeating_Min = VBatteryHeating_Min, 
        VBatteryHeating_Max = VBatteryHeating_Max) 

head(wgData) 
##    TIMESTAMP  TPanel VBatteryHeating_Avg VBatteryHeating_Min VBatteryHeating_Max 
## 1 2012-11-24 00:00:00 0.4770116   10.2937806   0.80151633   0.8722767 
## 2 2012-11-24 00:10:00 0.0304906   -20.7057773   0.32311092   0.7172383 
## 3 2012-11-24 00:20:00 1.4875903   0.5749393   0.74020471   0.5857239 
## 4 2012-11-24 00:30:00 0.4933884   6.6567398   0.73824231   0.3691020 
## 5 2012-11-24 00:40:00 -0.0369843   3.4332840   0.06552402   0.2455765 
## 6 2012-11-24 00:50:00 0.7339858   -3.3787044   0.06451802   0.5952835 

的東西也許是最好的解決方案是使用plyr。首先,像以前一樣使用cut來爲你的30分鐘組塊做出指示。然後使用ddply,通過該變量拆分數據幀。

wgData$Timestamp30min <- cut(wgData$TIMESTAMP,"30 min") 

library(plyr) 

out <- ddply(wgData, .(Timestamp30min), summarize, 
      TP = mean(TPanel), 
      VB = mean(VBatteryHeating_Avg), 
      VB_min = min(VBatteryHeating_Min), 
      VB_max = max(VBatteryHeating_Max)) 

head(out) 
##  Timestamp30min   TP   VB  VB_min VB_max 
## 1 2012-11-24 00:00:00 0.6650308 -3.27901911 0.32311092 0.8722767 
## 2 2012-11-24 00:30:00 0.3967966 2.23710649 0.06451802 0.5952835 
## 3 2012-11-24 01:00:00 -0.1326459 -1.20082543 0.50358789 1.0569388 
## 4 2012-11-24 01:30:00 0.7845420 -0.07520645 0.14500901 0.9656004 
## 5 2012-11-24 02:00:00 -0.4523882 0.40472169 0.24997021 1.4056166 
## 6 2012-11-24 02:30:00 -0.2317818 0.61860868 0.64909054 0.2338781 

備選地,可以使用aggregate每個功能(meanmin,和max)和這些結果,在時間的兩個數據幀使用merge

+0

工程就像一個魅力。非常感謝你! – marsl