2017-08-01 83 views
2

我有一個大型數據集,開始日期和結束日期有時在一個月內,但更常見的是跨越一個月或一年以上。最終,我想計算每個ID每個月的入住天數。當範圍變化時,從df中提取開始日期和結束日期

這裏是樣本數據:

ID = c(50:55) 
ENTRY = as.Date(c("11/6/2011", "04/08/2012", "10/9/2012", 
       "23/10/2012", "15/11/2012", "23/11/2012"), "%d/%m/%Y") 
EXIT = as.Date(c("11/7/2011", "06/09/2012", "24/9/2012", 
       "31/12/2012", "18/11/2012", "04/01/2013"), "%d/%m/%Y") 
Occupancy <- data.frame(ID, ENTRY, EXIT) 

ID  ENTRY  EXIT 
50 2011-06-11 2011-07-11 
51 2012-08-04 2012-09-06 
52 2012-09-10 2012-09-24 
53 2012-10-23 2012-12-31 
54 2012-11-15 2012-11-18 
55 2012-11-23 2013-01-04 

這是我想創建:

ID ENTRY EXIT 
50 6/11/2011 6/30/2011 
50 7/1/2011 7/11/2011 
51 8/4/2012 8/31/2012 
51 9/1/2012 9/6/2012 
: 
55 11/23/2012 11/30/2012 
55 12/1/2012 12/31/2012 
55 1/1/2013 1/4/2013 

任何建議,將不勝感激!

回答

2

希望這會有所幫助!
它會爲您提供最終結果 - 即每個ID在每個月的入住天數。

ID = c(50:55) 
ENTRY = as.Date(c("11/6/2011", "04/08/2012", "10/9/2012", 
        "23/10/2012", "15/11/2012", "23/11/2012"), "%d/%m/%Y") 
EXIT = as.Date(c("11/7/2011", "06/09/2012", "24/9/2012", 
       "31/12/2012", "18/11/2012", "04/01/2013"), "%d/%m/%Y") 
Occupancy <- data.frame(ID, ENTRY, EXIT) 

library(zoo) 
library(dplyr) 
monthList <- mapply(function(x,y) as.yearmon(seq(x,y, "day")), ENTRY, EXIT) 
OccupancyDf <- monthList %>% lapply(table) %>% lapply(as.list) %>% lapply(data.frame) %>% rbind_all() 
OccupancyDf$ID <- Occupancy$ID 
OccupancyDf[is.na(OccupancyDf)] <- 0 
OccupancyDf 

輸出是:

Jun.2011 Jul.2011 Aug.2012 Sep.2012 Oct.2012 Nov.2012 Dec.2012 Jan.2013 ID 
     20  11  0  0  0  0  0  0 50 
     0  0  28  6  0  0  0  0 51 
     0  0  0  15  0  0  0  0 52 
     0  0  0  0  9  30  31  0 53 
     0  0  0  0  0  4  0  0 54 
     0  0  0  0  0  8  31  4 55 


不要忘了讓我們知道是否能解決你的問題:)

+0

這個完美!你是個天才!非常感謝! – Borderlands54

+0

很高興它幫助!如果你喜歡這個解決方案,你可以選擇正確的答案或者標記:) – Prem

0

這裏的方式來獲得你顯示

輸出

以下函數將接受一行數據幀(ENTRYEXIT)並返回每月分解的數據幀。

custom.dates <- function(a,ts) { 
       if (ts > 0) { 
        newdates <- lapply(1:ts, function(x) a$ENTRY + period(x,"month")) 
        new.entry <- lapply(1:ts, function(x) { ymd(paste(year(newdates[[x]]), month(newdates[[x]]), "01", sep="-")) }) 
        newdates <- lapply((ts-1):0, function(x) a$ENTRY + period(x,"month")) 
        new.exit <- lapply(ts:1, function(x) { ymd(paste(year(newdates[[x]]), month(newdates[[x]]), days_in_month(month(newdates[[x]])), sep="-")) }) 
        df <- data.frame(ENTRY=sort(c(a$ENTRY,new.entry)), EXIT=sort(c(a$EXIT,new.exit))) 
       return(df) 
       } else { 
        return(a) 
       } 
      } 

使用tidyverse

library(tidyverse) 
result <- Occupancy %>% 
     mutate(monthspan = (year(EXIT)*12 + month(EXIT)) - (year(ENTRY)*12 + month(ENTRY))) %>% 
     nest(monthspan, ENTRY, EXIT) %>% 
     mutate(data = map(data, ~custom.dates(select(.x, -monthspan), .x$monthspan))) %>% 
     unnest(data) 

輸出

 ID  ENTRY  EXIT 
1 50 2011-06-11 2011-06-30 
2 50 2011-07-01 2011-07-11 
3 51 2012-08-04 2012-08-31 
4 51 2012-09-01 2012-09-06 
5 52 2012-09-10 2012-09-24 
6 53 2012-10-23 2012-10-31 
7 53 2012-11-01 2012-11-30 
8 53 2012-12-01 2012-12-31 
9 54 2012-11-15 2012-11-18 
10 55 2012-11-23 2012-11-30 
11 55 2012-12-01 2012-12-31 
12 55 2013-01-01 2013-01-04 
相關問題