2017-10-12 92 views
3

我有這樣的一個表:組織有最大值和最小值的數據r中

enter image description here

這是由下面的代碼生成:

id <- c("1","2","1","2","1","1") 
status <- c("open","open","closed","closed","open","closed") 
date <- c("11-10-2017 15:10","10-10-2017 12:10","12-10-2017 22:10","13-10-2017 06:30","13-10-2017 09:30","13-10-2017 10:30") 
data <- data.frame(id,status,date) 
hour <- data.frame(do.call('rbind', strsplit(as.character(data$date),' ',fixed=TRUE))) 
hour <- hour[,2] 
hour <- as.POSIXlt(hour, format = "%H:%M") 

我想達到的目標是選擇最早的開放時間最新的關閉時間爲爲每個id。所以,最終的結果會是這樣的:

enter image description here

目前我使用sqldf來解決這個問題:

sqldf("select * from (select id, status, date as closeDate, max(hour) as hour from data 
    where status='closed' 
    group by id,status) as a 
    join 
    (select id, status, date as openDate, min(hour) as hour from data 
    where status='open' 
    group by id,status) as b 
    using(id);") 

問題1:有沒有一種簡單的方法來做到這一點?

問題2:如果我選擇max(hour)任何其他名稱,而不是hour,結果將不會在日期和時間的格式,但像15078642001507807800一系列數字。如何在爲列指定不同名稱的同時保持時間格式?

+2

你的意思了'小時「成爲您的數據中的一列?也許你忘了'數據$小時< - 小時'行? – Gregor

回答

0

使用包plyr

(出於某種原因,如圖所示here,您必須將小時轉換爲as.POSIXct類,否則,你得到一個錯誤消息):

#add hour to data.frame: 
data$hour <- as.POSIXct(hour) 
library(plyr) 
ddply(data, .(id), summarize, open=min(hour[status=="open"]), 
    closed=max(hour[status=="closed"]))