2017-10-11 225 views
1

我使用dplyrifelse創建一個基於兩個條件的新列,其數據如下。dplyr ifelse聲明中的嵌套條件

dat <- structure(list(GenIndID = c("BHS_034", "BHS_034", "BHS_068", 
"BHS_068", "BHS_068", "BHS_068", "BHS_068", "BHS_068", "BHS_068", 
"BHS_068", "BHS_068"), IndID = c("BHS_034_A", "BHS_034_A", "BHS_068_A", 
"BHS_068_A", "BHS_068_A", "BHS_068_A", "BHS_068_A", "BHS_068_A", 
"BHS_068_A", "BHS_068_A", "BHS_068_A"), Fate = c("Mort", "Mort", 
"Alive", "Alive", "Alive", "Alive", "Alive", "Alive", "Alive", 
"Alive", "Alive"), Status = c("Alive", "Mort", "Alive", "Alive", 
"MIA", "Alive", "MIA", "Alive", "MIA", "Alive", "Alive"), Type = c("Linked", 
"Linked", "SOB", "SOB", "SOB", "SOB", "SOB", "SOB", "SOB", "SOB", 
"SOB"), SurveyID = c("GYA13-1", "GYA14-1", "GYA13-1", "GYA14-1", 
"GYA14-2", "GYA15-1", "GYA16-1", "GYA16-2", "GYA17-1", "GYA17-3", 
"GYA15-2"), SurveyDt = structure(c(1379570400, 1407477600, 1379570400, 
1407477600, 1409896800, NA, 1462946400, 1474351200, 1495519200, 
1507010400, 1441951200), tzone = "", class = c("POSIXct", "POSIXt" 
))), row.names = c(NA, 11L), .Names = c("GenIndID", "IndID", 
"Fate", "Status", "Type", "SurveyID", "SurveyDt"), class = "data.frame") 

> dat 
    GenIndID  IndID Fate Status Type SurveyID SurveyDt 
1 BHS_034 BHS_034_A Mort Alive Linked GYA13-1 2013-09-19 
2 BHS_034 BHS_034_A Mort Mort Linked GYA14-1 2014-08-08 
3 BHS_068 BHS_068_A Alive Alive SOB GYA13-1 2013-09-19 
4 BHS_068 BHS_068_A Alive Alive SOB GYA14-1 2014-08-08 
5 BHS_068 BHS_068_A Alive MIA SOB GYA14-2 2014-09-05 
6 BHS_068 BHS_068_A Alive Alive SOB GYA15-1  <NA> 
7 BHS_068 BHS_068_A Alive MIA SOB GYA16-1 2016-05-11 
8 BHS_068 BHS_068_A Alive Alive SOB GYA16-2 2016-09-20 
9 BHS_068 BHS_068_A Alive MIA SOB GYA17-1 2017-05-23 
10 BHS_068 BHS_068_A Alive Alive SOB GYA17-3 2017-10-03 
11 BHS_068 BHS_068_A Alive Alive SOB GYA15-2 2015-09-11 

更具體地說,GenIndID分組我想要一個新的日期字段是基於兩個條件語句爲TypeFate最大SurveyDt。此外,我希望最大日期僅在Status == Alive時評估SurveyDt。我的代碼如下生成所有NA值,而不是所有符合所有指定條件的BHS_068的描述日期字段。

我最近看到case_when這可能適合在這裏,但我無法正確實施它。

dat %>% group_by(GenIndID) %>% 
    mutate(NewDat = as.POSIXct(ifelse(Type == "SOB" & Fate == "Alive", max(SurveyDt[Status == "Alive"], na.rm = F), NA), 
          origin='1970-01-01', na.rm=T)) %>% 
    as.data.frame() 

任何意見,將不勝感激。

+0

你能提供出所需的輸出如何看起來像一個桌子? – Cris

回答

2

如果您想堅持使用dplyr並使用case_when您必須確保每個case語句的值都是相同的類型。

在這種情況下,您的TRUE值將是datetime,因此您必須將默認值設置爲datetime,並將其包裝在as.POSIXct中。

dat %>% 
    group_by(GenIndID) %>% 
    mutate(NewDat = case_when(Type == "SOB" & Fate == "Alive" ~ max(SurveyDt[Status == "Alive"], na.rm = TRUE), 
          TRUE ~ as.POSIXct(NA, origin = "1970-01-01"))) 

使用ifelse

dat %>% 
    group_by(GenIndID) %>% 
    mutate(NewDat = ifelse(Type == "SOB" & Fate == "Alive", 
         max(SurveyDt[Status == "Alive"], na.rm = TRUE), 
         as.POSIXct(NA, origin = "1970-01-01"))) 
+0

因爲我更熟悉該語法,所以如果ifelse可能會出現相同的結果,那麼我不會綁定到'case_when'。 –

+0

對於'case_when'是'TRUE〜as.POSIXct(NA,origin =「1970-01-01」''提供'ifelse'的else部分?也就是說,對於指定條件沒有得到滿足,我無法從幫助文件中解釋這一點(用我的能力......)。 –

2

我們可以使用data.table。在轉換爲data.table(setDT(dat))後,指定i作爲邏輯比較,按'GenIndID'分組,我們將'SurveyDt'的max分配給'NewDat','Status'爲'NewDat'

library(data.table) 
setDT(dat)[Type == "SOB" & Fate == "Alive", 
     NewDat := max(SurveyDt[Status == "Alive"], na.rm = TRUE), GenIndID]