2014-06-12 59 views
0

假設我有這個data.table:操縱data.table擴大日期

library(data.table) 
data <- data.table(Event = c("P_2800 Back","Holiday_PRE","Holiday","Holiday_POST","P_100 Back","Holiday_PRE","Holiday","Holiday_POST", "P_100 Back"), Event_From=c("25/03/2010","11/04/2010","12/04/2010" ,"15/04/2010","02/05/2010", "11/04/2011","12/04/2011" ,"15/04/2011","02/05/2011"), Event_Pre=c(NA,NA,NA,NA,1,NA,NA,NA,1), Event_Post=c(NA, NA, NA,NA,2, NA, NA,NA,2), Event_To=c("25/03/2010","11/04/2010","14/04/2010" ,"15/04/2010","02/05/2010","11/04/2011","14/04/2011" ,"15/04/2011","02/05/2011"), Holiday=c(F,F,T,F,F,F,T,F,F)) 

返回,

  Event Event_From Event_Pre Event_Post Event_To Holiday 
1: P_2800 Back 25/03/2010   NA  NA 25/03/2010 FALSE 
2: Holiday_PRE 11/04/2010   NA  NA 11/04/2010 FALSE 
3:  Holiday 12/04/2010   NA  NA 14/04/2010 TRUE 
4: Holiday_POST 15/04/2010   NA  NA 15/04/2010 FALSE 
5: P_100 Back 02/05/2010   1   2 02/05/2010 FALSE 
6: Holiday_PRE 11/04/2011   NA  NA 11/04/2011 FALSE 
7:  Holiday 12/04/2011   NA  NA 14/04/2011 TRUE 
8: Holiday_POST 15/04/2011   NA  NA 15/04/2011 FALSE 
9: P_100 Back 02/05/2011   1   2 02/05/2011 FALSE 

我想擴大原來的日期欄包括

  1. Event_FromEvent_To之間的日期。日期之前

  2. Ñ日期在Event_From柱和日期Event_To米其中ÑEvent_Pre柱和Event_Post(該值在本例中,該事件P_100回到,所述結果應該是

01/05/2010和04/05/2010之間的日期)的最終結果應該如下:

Event    Date   Holiday 
P_2800 Back  25/03/2010 FALSE 
Holiday_PRE  11/04/2010 FALSE 
Holiday   12/04/2010 TRUE 
Holiday   13/04/2010 TRUE 
Holiday   14/04/2010 TRUE 
Holiday_POST  15/04/2010 FALSE 
P_100 Back  01/05/2010 FALSE 
P_100 Back  02/05/2010 FALSE 
P_100 Back  03/05/2010 FALSE 
P_100 Back  04/05/2010 FALSE 
Holiday_PRE  11/04/2011 FALSE 
Holiday   12/04/2011 TRUE 
Holiday   13/04/2011 TRUE 
Holiday   14/04/2011 TRUE 
Holiday_POST  15/04/2011 FALSE 
P_100 Back  01/05/2011 FALSE 
P_100 Back  02/05/2011 FALSE 
P_100 Back  03/05/2011 FALSE 
P_100 Back  04/05/2011 FALSE 

你能給我關於操縱這個data.table的建議嗎?

謝謝

回答

2
# Let's get rid of those pesky NA's 
data[is.na(Event_Post), Event_Post := 0] 
data[is.na(Event_Pre), Event_Pre := 0] 

# Not much left, construct the final result 
data[, list(Date = seq(as.Date(Event_From, format="%d/%m/%Y") - Event_Post, 
         as.Date(Event_To, format="%d/%m/%Y") + Event_Pre, 
         by = 1), 
      Holiday), 
     by = list(Event, Event_From)][, !"Event_From", with = FALSE] 
#   Event  Date Holiday 
# 1: P_2800 Back 2010-03-25 FALSE 
# 2: Holiday_PRE 2010-04-11 FALSE 
# 3:  Holiday 2010-04-12 TRUE 
# 4:  Holiday 2010-04-13 TRUE 
# 5:  Holiday 2010-04-14 TRUE 
# 6: Holiday_POST 2010-04-15 FALSE 
# 7: P_100 Back 2010-04-30 FALSE 
# 8: P_100 Back 2010-05-01 FALSE 
# 9: P_100 Back 2010-05-02 FALSE 
#10: P_100 Back 2010-05-03 FALSE 
#11: Holiday_PRE 2011-04-11 FALSE 
#12:  Holiday 2011-04-12 TRUE 
#13:  Holiday 2011-04-13 TRUE 
#14:  Holiday 2011-04-14 TRUE 
#15: Holiday_POST 2011-04-15 FALSE 
#16: P_100 Back 2011-04-30 FALSE 
#17: P_100 Back 2011-05-01 FALSE 
#18: P_100 Back 2011-05-02 FALSE 
#19: P_100 Back 2011-05-03 FALSE 
+0

謝謝你的工作,但如果我們有重複的事件,例如同樣的節日,是在2011年,代碼生成以下錯誤:在seq.Date'錯誤(as.Date (Event_From,format =「%d /%m /%Y」) - Event_Post,: 'from'must be length 1'。我已修改OP來反映這個問題 – newbie

+0

@newbie如果你明白這個改變是微不足道的以上是怎麼回事,你應該簡單地用'list(Event,Event_From)'而不是簡單的'Event'來分組 - 參見edit – eddi