2017-04-08 77 views
0

日期範圍的這個向量包含在我的類「字符」的數據框中。該格式取決於日期範圍是否跨越到一個不同的月份:將日期範圍轉換爲R中的日期類型

dput(pollingdata$dates) 
c("Nov. 1-7", "Nov. 1-7", "Oct. 24-Nov. 6", "Oct. 4-Nov. 6", 
"Oct. 30-Nov. 6", "Oct. 25-31", "Oct. 7-27", "Oct. 21-Nov. 3", 
"Oct. 20-24", "Jul. 19", "Oct. 29-Nov. 4", "Oct. 28-Nov. 3", 
"Oct. 27-Nov. 2", "Oct. 20-28", "Sep. 30-Oct. 20", "Oct. 15-19", 
"Oct. 26-Nov. 1", "Oct. 25-31", "Oct. 24-30", "Oct. 18-26", 
"Oct. 10-14", "Oct. 4-9", "Sep. 23-Oct. 6", "Sep. 16-29", "Sep. 2-22", 
"Oct. 21-Nov. 2", "Oct. 17-25", "Sep. 30-Oct. 13", "Sep. 27-Oct. 3", 
"Sep. 21-26", "Sep. 14-20", "Aug. 26-Sep. 15", "Sep. 7-13", 
"Aug. 19-Sep. 8", "Aug. 31-Sep. 6", "Aug. 12-Sep. 1", "Aug. 9-Sep. 1", 
"Aug. 24-30", "Aug. 5-25", "Aug. 17-23", "Jul. 29-Aug. 18", 
"Aug. 10-16", "Jan. 12") 

我想這個向量轉換成兩個單獨列在我的數據幀,1開始日期和結束日期2,在開始和結束的範圍內。兩欄應保存爲'Date'類,這將使我更容易在項目中使用這些數據。有誰知道一個簡單的方法來做這個操作?我一直在努力。

由於提前,

回答

2

我們可以通過-分裂載體導入list,通過paste替換具有在端部只有數字元素荷蘭國際集團月子,附加NA爲那些具有使用小於2組的元素(length<-),並轉換爲data.frame(與do.call(rbind.data.frame

lst <- lapply(strsplit(v1, "-"), function(x) { 
     i1 <- grepl("^[0-9]+", x[length(x)]) 
     if(i1) { 
      x[length(x)] <- paste(substr(x[1], 1, 4), x[length(x)]) 
      x} else x}) 
d1 <- do.call(rbind.data.frame, lapply(lst, `length<-`, max(lengths(lst)))) 
colnames(d1) <- c("Start_Date", "End_Date") 

按照該OP的帖子,我們需要轉換爲Date類,但Date類遵循format%Y-%m-%d。在向量中,沒有一年,不確定我們可以粘貼當前年份並轉換爲Date類。如果這是允許的,那麼

d1[] <- lapply(d1, function(x) as.Date(paste(x, 2017), "%b. %d %Y")) 
head(d1) 
# Start_Date End_Date 
#1 2017-11-01 2017-11-07 
#2 2017-11-01 2017-11-07 
#3 2017-10-24 2017-11-06 
#4 2017-10-04 2017-11-06 
#5 2017-10-30 2017-11-06 
#6 2017-10-25 2017-10-31 
+0

這個偉大工程,讓我鑽進去了。這些列不是日期格式,但我可能能得到 – Canovice

+0

@Canvice Date需要年份信息,在您的數據集中它不會顯示。如果您可以隨意粘貼一年,那麼它會轉換爲「日期」(顯示在更新中) – akrun

1

您可以使用庫stringr功能「str_split_fixed」分裂字段,然後處理數據。地圖圖書館stringr和流程如下:

library(stringr) 
    dat <- data.frame(date=c("Nov. 1-7", "Nov. 1-7", "Oct. 24-Nov. 6", "Oct. 4-Nov. 6", 
       "Oct. 30-Nov. 6", "Oct. 25-31", "Oct. 7-27", "Oct. 21-Nov. 3", 
       "Oct. 20-24", "Jul. 19", "Oct. 29-Nov. 4", "Oct. 28-Nov. 3", 
       "Oct. 27-Nov. 2", "Oct. 20-28", "Sep. 30-Oct. 20", "Oct. 15-19", 
       "Oct. 26-Nov. 1", "Oct. 25-31", "Oct. 24-30", "Oct. 18-26", 
       "Oct. 10-14", "Oct. 4-9", "Sep. 23-Oct. 6", "Sep. 16-29", "Sep. 2-22", 
       "Oct. 21-Nov. 2", "Oct. 17-25", "Sep. 30-Oct. 13", "Sep. 27-Oct. 3", 
       "Sep. 21-26", "Sep. 14-20", "Aug. 26-Sep. 15", "Sep. 7-13", 
       "Aug. 19-Sep. 8", "Aug. 31-Sep. 6", "Aug. 12-Sep. 1", "Aug. 9-Sep. 1", 
       "Aug. 24-30", "Aug. 5-25", "Aug. 17-23", "Jul. 29-Aug. 18", 
       "Aug. 10-16", "Jan. 12")) 

輸出處理:

#spliting with space and dash 
dt <- data.frame(str_split_fixed(dat$date, "[-]|\\s",4)) 
names(dt) <- c("stdt1","stdt2","endt1","endt2") 
##Removing dot(.) and replacing with "" 
dt1 <- data.frame(sapply(dt,function(x)gsub("[.]","",x))) 
dt1$stdt <- as.Date(paste0(dt1$stdt2,dt1$stdt1,"2016"),format="%d%b%Y") 
dt1$endt <- ifelse(dt1$endt2=="",paste0(dt1$endt1,dt1$stdt1,"2016"), 
       paste0(dt1$endt2,dt1$endt1,"2016")) 

dt1$endt <-as.Date(ifelse(nchar(dt1$endt)==7,paste0(dt1$stdt2,dt1$endt),dt1$endt),"%d%b%Y") 

假設:

1)沒有提供今年,所以我已年2016。

2)第10行和第43行,結束日期「day」沒有信息,因此I已假定當天開始日期。

答:

> dt1 
    stdt1 stdt2 endt1 endt2  stdt  endt 
1 Nov  1  7  2016-11-01 2016-11-07 
2 Nov  1  7  2016-11-01 2016-11-07 
3 Oct 24 Nov  6 2016-10-24 2016-11-06 
4 Oct  4 Nov  6 2016-10-04 2016-11-06 
5 Oct 30 Nov  6 2016-10-30 2016-11-06 
6 Oct 25 31  2016-10-25 2016-10-31 
7 Oct  7 27  2016-10-07 2016-10-27 
8 Oct 21 Nov  3 2016-10-21 2016-11-03 
9 Oct 20 24  2016-10-20 2016-10-24 
10 Jul 19    2016-07-19 2016-07-19 
相關問題