2017-09-14 131 views
-2

我的季節時間是次年的10月1日至3月31日。我如何創建一個賽季一個虛擬變量來看到這個人在進出在R中創建季節變量

df <- data.frame(ID= c(1:6), 
      Drug = c("A","C","A","A","B","A"), 
      Start = c("01/01/2009","07/10/2010","10/10/2009","03/01/2011","03/01/2012","04/12/2010"), 
      End=c("09/10/2009","04/20/2011","07/20/1010","01/01/2012","04/01/2013","09/30/2011")) 

我的輸出曝光:

ID Drug  Start  End Season 
1 1 A 01/01/2009 09/10/2009  1 
2 1 A 01/01/2009 09/10/2009  0 
3 2 C 07/10/2010 04/20/2011  0 
4 2 C 07/10/2010 04/20/2011  1 
5 2 C 07/10/2010 04/20/2011  0 
6 3 A 10/10/2009 07/20/1010  1 
7 3 A 10/10/2009 07/20/1010  0 
8 3 A 10/10/2009 07/20/1010  1 
9 4 B 03/01/2011 01/01/2012  1 
10 4 B 03/01/2011 01/01/2012  0 
11 4 B 03/01/2011 01/01/2012  1 
12 5 A 03/01/2012 04/01/2013  1 
13 5 A 03/01/2012 04/01/2013  0 
14 5 A 03/01/2012 04/01/2013  1 
15 5 A 03/01/2012 04/01/2013  0 
16 6 A 04/12/2010 09/30/2011  0 

ID 1:她從01/01和09/10末開始。

[01/01, 03/31] =1 

[03/31,09/10] = 0 

ID 2:她從07/10/10開始,04/20結束。我檢查

[07/10, 10/01] = 0 

[10/01,03/31] = 1 

[03/31, 04/20] = 0 

ID5她開始03/01和04/01結束

[03/01, 03/31]= 1 

[03/31, 10/01] = 0 

[10/01, 03/31] = 1 

[03/31, 04/01] = 0 
+3

我不清楚你在問什麼。因此,患者2獲得了三排季節0,1,0,因爲她在賽季外開始,經歷了賽季,並在賽季之外結束了賽季? – lebelinoz

+0

患者5獲得四排,因爲她經歷了四個時期(兩個賽季和兩個淡季)? – lebelinoz

+0

她從07/10/2010開始到2011年4月20日結束,因此我檢查[07/10,10/01] = 0,然後[10/1,03/31] = 1,[03/31/04/20] = 0 – BIN

回答

1

我覺得我得到了ExposedIn和ExposedOut正確使用下面的代碼(注意:您需要添加「stringsAsFactors = FALSE'當你創建你的數據框時)。但是,我沒有足夠的時間來計算所涵蓋的整個季節的額外總和 - 我會通過添加具有日期/時間功能的另一列來考慮整個治療時間。

df$Start <- as.Date(df$Start, format = '%m/%d/%Y') 
df$End <- as.Date(df$End, format = '%m/%d/%Y') 
df$SeasonIn <- 274 # 275 in leap years 
df$SeasonOut <- 90 # 91 in leap years 
df$ExposedIn <- as.integer(as.POSIXlt(df$Start)$yday >= df$SeasonIn | 
as.POSIXlt(df$Start)$yday < df$SeasonOut) 
df$ExposedOut <- as.integer(as.POSIXlt(df$End)$yday >= df$SeasonIn | 
as.POSIXlt(df$End)$yday < df$SeasonOut) 

希望這至少有助於一些。