2016-05-14 122 views
0

讓我有以下數據:對於每小時數據,獲得最大的價值每天

time <- seq(ISOdate(2007,7,1,0), ISOdate(2008,4,5,23), by = "1 hour") 
y <- rnorm(n = length(time)) 

year <- as.numeric(substr((as.character(time)), 1, 4)) # year number as numeric 

month <- as.numeric(substr((as.character(time)), 6, 7)) # month number as numeric 

day <- as.numeric(substr((as.character(time)), 9, 10)) # day number as numeric 

hour <- as.numeric(substr((as.character(time)), 12, 13)) # hour number as numeric 

dat <- data.frame(year=year, month=month, day=day, hour=hour, y = y) 

每一天,有在每個小時(0〜23)24個y值。現在我必須每天最多找到y。也就是說,對於「2007-10-05」日期,在每個小時(0到23)中獲得的值有24 y,我必須獲得「2007-10-05」日的最大值。因此,從「2007-07-01」到「2008-04-05」之間有279天,因此我將獲得279個最大值y值。

我該怎麼做?

回答

3

使用dplyr

library(dplyr) 
dyp1 <- dat %>% 
     group_by(year, month, day) %>% 
     summarise(y=max(y)) 

使用data.table

library(data.table) 
setDT(dat)[, .(y=max(y)), by = .(year, month, day)] 

使用鹼R

aggregate(y ~ year+month+day, dat, max) 
2

使用sqldf

library(sqldf) 
sqldf("select year, month, day, 
     max(y) as y 
     from dat 
     group by year, month, day") 

或者另一種選擇是命令「Y」,並選擇所述第一值

library(data.table) 
setDT(dat)[order(-y), .(y= y[1L]), by = .(year, month, day)] 

或用dplyr

library(dplyr) 
dat %>% 
    group_by(year, month, day) %>% 
    arrange(desc(y)) %>% 
    summarise(y = first(y)) 
1

直接應用剪切命令的時間和y陣列:

tapply(y, INDEX =cut(time, breaks="day"), max) 

或使用dplyr庫:

library(dplyr) 
df<-data.frame(time, y) 
summarize(group_by(df, cut(df$time, breaks="day")), max(y))