R於前述行比較值

我的數據是這樣的：R於前述行比較值

Incident.ID.. = c(rep("INCFI0000029582",4), rep("INCFI0000029587",4)) 
date = c("2014-09-25 08:39:45", "2014-09-25 08:39:48", "2014-09-25 08:40:44", "2014-10-10 23:04:00", "2014-09-25 08:33:32", "2014-09-25 08:34:41", "2014-09-25 08:35:24", "2014-10-10 23:04:00") 
status = c("assigned", "in.progress", "resolved", "closed", "assigned", "resolved", "resolved", "closed") 
date.diff=c (3, 56, 1347796,0 ,69 ,43, 1348116, 0) 
df = data.frame(Incident.ID..,date, status, date.diff, stringsAsFactors = FALSE) 

df 
    Incident.ID..    date  status date.diff 
1 INCFI0000029582 2014-09-25 08:39:45 assigned   3 
2 INCFI0000029582 2014-09-25 08:39:48 in.progress  56 
3 INCFI0000029582 2014-09-25 08:40:44 resolved 1347796 
4 INCFI0000029582 2014-10-10 23:04:00  closed   0 
5 INCFI0000029587 2014-09-25 08:33:32 assigned  69 
6 INCFI0000029587 2014-09-25 08:34:41 resolved  43 
7 INCFI0000029587 2014-09-25 08:35:24 resolved 1348116 
8 INCFI0000029587 2014-10-10 23:04:00  closed   0

而且我想只挑選狀態爲「已解決」的行一定Incident.ID ..當它的後面沒有同一Incident.ID ..的狀態「關閉」（可能只有「已解決」或只有「關閉」 - 行的行，所以這就是爲什麼Incident.ID ..必須在進行比較時相同）。

例如這裏在這個例子中的數據，僅此行將被拾起：

6 INCFI0000029587 2014-09-25 08:34:41 resolved  43

所以我怎麼可能請做到這一點？

來源

2015-05-22 ElinaJ

下面是使用dplyr到組使用「鉛」功能由事件ID的數據，然後進行過濾（選擇行）尋找到下一行一個簡單的方法：

library(dplyr) 
df %>% 
    group_by(Incident.ID..) %>% 
    filter(status == "resolved" & lead(status) != "closed") # you can add %>% ungroup() if required 
#Source: local data frame [1 x 4] 
#Groups: Incident.ID.. 
# 
# Incident.ID..    date status date.diff 
#1 INCFI0000029587 2014-09-25 08:34:41 resolved  43

來源

2015-05-22 12:01:24

@ElinaJ，您需要安裝並加載包dplyr。試試'install.packages（「dplyr」）;庫（dplyr）' –

library(data.table) #using the development version of data.table 
setDT(df)[, .SD[status == "resolved" & shift(status, type = "lead") != "closed"], by = Incident.ID..] 
    Incident.ID..    date status date.diff 
1: INCFI0000029587 2014-09-25 08:34:41 resolved  43

附：根據@David的評論更新

來源

2015-05-22 12:04:06 user227710

爲什麼不只是「混合」dplyr與data.table做：'setDT（df）[status ==「resolved」＆lead（status）！=「closed」，by = Incident.ID ..] '它甚至不需要開發版本 – grrgrrbla

@David：謝謝。 – user227710

R於前述行比較值

回答

相關問題