2013-12-18 65 views
1

我想檢查一下個人在資格狀態中是否沒有任何差距。我將缺口定義爲在最後一個elig_end_date之後30天發生的date_of_claim。因此,我想要做的是檢查每個date_of_claim不超過緊接前一行的elig_end_date + 30天。理想情況下,我希望有一個指標說0表示沒有差距,1表示如果每個人存在差距,差距出現在哪裏。下面是一個示例df,其中內置解決方案爲'間隙'。比較R中不同列的不同列

names date_of_claim elig_end_date obs gaps 
1 tom 2010-01-01 2010-07-01 1 NA 
2 tom 2010-05-04 2010-07-01 1 0 
3 tom 2010-06-01 2014-01-01 2 0 
4 tom 2010-10-10 2014-01-01 2 0 
5 mary 2010-03-01 2014-06-14 1 NA 
6 mary 2010-05-01 2014-06-14 1 0 
7 mary 2010-08-01 2014-06-14 1 0 
8 mary 2010-11-01 2014-06-14 1 0 
9 mary 2011-01-01 2014-06-14 1 0 
10 john 2010-03-27 2011-03-01 1 NA 
11 john 2010-07-01 2011-03-01 1 0 
12 john 2010-11-01 2011-03-01 1 0 
13 john 2011-02-01 2011-03-01 1 0 
14 sue 2010-02-01 2010-04-30 1 NA 
15 sue 2010-02-27 2010-04-30 1 0 
16 sue 2010-03-13 2010-05-31 2 0 
17 sue 2010-04-27 2010-06-30 3 0 
18 sue 2010-04-27 2010-06-30 3 0 
19 sue 2010-05-06 2010-08-31 4 0 
20 sue 2010-06-08 2010-09-30 5 0 
21 mike 2010-05-01 2010-07-30 1 NA 
22 mike 2010-06-01 2010-07-30 1 0 
23 mike 2010-11-12 2011-07-30 2 1 

我發現這個職位是非常有用How can I compare a value in a column to the previous one using R?,但覺得我不能使用循環作爲我的DF擁有400萬行,而我也有很多困難的嘗試在它已經運行一個循環。

爲此,我想我需要的代碼是這樣的:

df$gaps<-ifelse(df$date_of_claim>=df$elig_end_date+30,1,0) ##this doesn't use the preceeding row. 

我用這個做了一個笨拙的嘗試:

df$gaps<-df$date_of_claim>=df$elig_end_date[-1,] 

,但我得到一個錯誤說我有不正確的尺寸數量。

非常感謝!謝謝。

+0

那些是什麼NA的? –

回答

1

備有四點萬餘意見,我會用data.table:

DF <- read.table(text="names date_of_claim elig_end_date obs gaps 
1 tom 2010-01-01 2010-07-01 1 NA 
2 tom 2010-05-04 2010-07-01 1 0 
3 tom 2010-06-01 2014-01-01 2 0 
4 tom 2010-10-10 2014-01-01 2 0 
5 mary 2010-03-01 2014-06-14 1 NA 
6 mary 2010-05-01 2014-06-14 1 0 
7 mary 2010-08-01 2014-06-14 1 0 
8 mary 2010-11-01 2014-06-14 1 0 
9 mary 2011-01-01 2014-06-14 1 0 
10 john 2010-03-27 2011-03-01 1 NA 
11 john 2010-07-01 2011-03-01 1 0 
12 john 2010-11-01 2011-03-01 1 0 
13 john 2011-02-01 2011-03-01 1 0 
14 sue 2010-02-01 2010-04-30 1 NA 
15 sue 2010-02-27 2010-04-30 1 0 
16 sue 2010-03-13 2010-05-31 2 0 
17 sue 2010-04-27 2010-06-30 3 0 
18 sue 2010-04-27 2010-06-30 3 0 
19 sue 2010-05-06 2010-08-31 4 0 
20 sue 2010-06-08 2010-09-30 5 0 
21 mike 2010-05-01 2010-07-30 1 NA 
22 mike 2010-06-01 2010-07-30 1 0 
23 mike 2010-11-12 2011-07-30 2 1", header=TRUE) 

library(data.table) 
DT <- data.table(DF) 

DT[, c("date_of_claim", "elig_end_date") := list(as.Date(date_of_claim), as.Date(elig_end_date))] 

DT[, gaps2:= c(NA, date_of_claim[-1] > head(elig_end_date, -1)+30), by=names] 

# names date_of_claim elig_end_date obs gaps gaps2 
# 1: tom 2010-01-01 2010-07-01 1 NA NA 
# 2: tom 2010-05-04 2010-07-01 1 0 FALSE 
# 3: tom 2010-06-01 2014-01-01 2 0 FALSE 
# 4: tom 2010-10-10 2014-01-01 2 0 FALSE 
# 5: mary 2010-03-01 2014-06-14 1 NA NA 
# 6: mary 2010-05-01 2014-06-14 1 0 FALSE 
# 7: mary 2010-08-01 2014-06-14 1 0 FALSE 
# 8: mary 2010-11-01 2014-06-14 1 0 FALSE 
# 9: mary 2011-01-01 2014-06-14 1 0 FALSE 
# 10: john 2010-03-27 2011-03-01 1 NA NA 
# 11: john 2010-07-01 2011-03-01 1 0 FALSE 
# 12: john 2010-11-01 2011-03-01 1 0 FALSE 
# 13: john 2011-02-01 2011-03-01 1 0 FALSE 
# 14: sue 2010-02-01 2010-04-30 1 NA NA 
# 15: sue 2010-02-27 2010-04-30 1 0 FALSE 
# 16: sue 2010-03-13 2010-05-31 2 0 FALSE 
# 17: sue 2010-04-27 2010-06-30 3 0 FALSE 
# 18: sue 2010-04-27 2010-06-30 3 0 FALSE 
# 19: sue 2010-05-06 2010-08-31 4 0 FALSE 
# 20: sue 2010-06-08 2010-09-30 5 0 FALSE 
# 21: mike 2010-05-01 2010-07-30 1 NA NA 
# 22: mike 2010-06-01 2010-07-30 1 0 FALSE 
# 23: mike 2010-11-12 2011-07-30 2 1 TRUE 
#  names date_of_claim elig_end_date obs gaps gaps2 
+0

謝謝@羅蘭。這工作真的很好,很快。 – user2363642