2016-06-09 46 views
1

我想轉換以下,並沒有成功的日期之一[1]。 「4/2/10」變成「0010-04-02」。Lubridate Mdy功能

有沒有辦法解決這個問題?

感謝, 維韋克

data <- data.frame(initialDiagnose = c("4/2/10","14.01.2009", "9/22/2005", 
     "4/21/2010", "28.01.2010", "09.01.2009", "3/28/2005", 
     "04.01.2005", "04.01.2005", "Created on 9/17/2010", "03 01 2010")) 

mdy <- mdy(data$initialDiagnose) 
dmy <- dmy(data$initialDiagnose) 
mdy[is.na(mdy)] <- dmy[is.na(mdy)] # some dates are ambiguous, here we give 
data$initialDiagnose <- mdy  # mdy precedence over dmy 
data 

    initialDiagnose 
1  0010-04-02 
2  2009-01-14 
3  2005-09-22 
4  2010-04-21 
5  2010-01-28 
6  2009-09-01 
7  2005-03-28 
8  2005-04-01 
9  2005-04-01 
10  2010-09-17 
11  2010-03-01 
+0

它只是第一個值,或者你是否需要更大的數據集的更一般的解決方案? –

+0

很多關於將2位數年份轉換爲4位數年份的有用信息:http://stackoverflow.com/questions/9508747/add-correct-century-to-dates-with-year-provided-as-year-without- century-y – jalapic

+1

如果你單獨解析它,它會正常工作;只是格式的多樣性拉長了'parse_date_time'的格式猜測太寬。假設它不是一個巨大的向量,只是循環它,它會正常工作:'do.call(c,lapply(data $ initialDiagnose,lubridate :: parse_date_time,orders = c('mdy','dmy')) )' – alistaire

回答

3

我認爲這正在發生,因爲mdy()功能更喜歡到一年%Y(實際年)在%y(2位縮寫爲一年匹配,默認爲19XX或20XX)。

雖然有一個解決方法。我查看了lubridate::parse_date_time?parse_date_time)的幫助文件,並且在幫助文件的底部附近有一個示例,用於添加一個參數,該參數與年份的%Y格式相比更喜歡與%y格式匹配。代碼從幫助文件中的相關位:

## ** how to use `select_formats` argument ** 
## By default %Y has precedence: 
parse_date_time(c("27-09-13", "27-09-2013"), "dmy") 
## [1] "13-09-27 UTC" "2013-09-27 UTC" 

## to give priority to %y format, define your own select_format function: 

my_select <- function(trained){ 
    n_fmts <- nchar(gsub("[^%]", "", names(trained))) + grepl("%y", names(trained))*1.5 
    names(trained[ which.max(n_fmts) ]) 
} 

parse_date_time(c("27-09-13", "27-09-2013"), "dmy", select_formats = my_select) 
## '[1] "2013-09-27 UTC" "2013-09-27 UTC" 

所以,你的榜樣,你可以適應這個代碼,並替換這個mdy <- mdy(data$initialDiagnose)行:

# Define a select function that prefers %y over %Y. This is copied 
# directly from the help files 
my_select <- function(trained){ 
    n_fmts <- nchar(gsub("[^%]", "", names(trained))) + grepl("%y", names(trained))*1.5 
    names(trained[ which.max(n_fmts) ]) 
} 

# Parse as mdy dates 
mdy <- parse_date_time(data$initialDiagnose, "mdy", select_formats = my_select) 
# [1] "2010-04-02 UTC" NA    "2005-09-22 UTC" "2010-04-21 UTC" NA    
# [6] "2009-09-01 UTC" "2005-03-28 UTC" "2005-04-01 UTC" "2005-04-01 UTC" "2010-09-17 UTC" 
#[11] "2010-03-01 UTC" 

和運行的代碼,其餘行從你的問題,它給了我這個數據幀作爲結果:

initialDiagnose 
1  2010-04-02 
2  2009-01-14 
3  2005-09-22 
4  2010-04-21 
5  2010-01-28 
6  2009-09-01 
7  2005-03-28 
8  2005-04-01 
9  2005-04-01 
10  2010-09-17 
11  2010-03-01 
+0

這是完美的。謝謝。 –