我認爲這正在發生,因爲mdy()
功能更喜歡到一年%Y
(實際年)在%y
(2位縮寫爲一年匹配,默認爲19XX或20XX)。
雖然有一個解決方法。我查看了lubridate::parse_date_time
(?parse_date_time
)的幫助文件,並且在幫助文件的底部附近有一個示例,用於添加一個參數,該參數與年份的%Y
格式相比更喜歡與%y
格式匹配。代碼從幫助文件中的相關位:
## ** how to use `select_formats` argument **
## By default %Y has precedence:
parse_date_time(c("27-09-13", "27-09-2013"), "dmy")
## [1] "13-09-27 UTC" "2013-09-27 UTC"
## to give priority to %y format, define your own select_format function:
my_select <- function(trained){
n_fmts <- nchar(gsub("[^%]", "", names(trained))) + grepl("%y", names(trained))*1.5
names(trained[ which.max(n_fmts) ])
}
parse_date_time(c("27-09-13", "27-09-2013"), "dmy", select_formats = my_select)
## '[1] "2013-09-27 UTC" "2013-09-27 UTC"
所以,你的榜樣,你可以適應這個代碼,並替換這個mdy <- mdy(data$initialDiagnose)
行:
# Define a select function that prefers %y over %Y. This is copied
# directly from the help files
my_select <- function(trained){
n_fmts <- nchar(gsub("[^%]", "", names(trained))) + grepl("%y", names(trained))*1.5
names(trained[ which.max(n_fmts) ])
}
# Parse as mdy dates
mdy <- parse_date_time(data$initialDiagnose, "mdy", select_formats = my_select)
# [1] "2010-04-02 UTC" NA "2005-09-22 UTC" "2010-04-21 UTC" NA
# [6] "2009-09-01 UTC" "2005-03-28 UTC" "2005-04-01 UTC" "2005-04-01 UTC" "2010-09-17 UTC"
#[11] "2010-03-01 UTC"
和運行的代碼,其餘行從你的問題,它給了我這個數據幀作爲結果:
initialDiagnose
1 2010-04-02
2 2009-01-14
3 2005-09-22
4 2010-04-21
5 2010-01-28
6 2009-09-01
7 2005-03-28
8 2005-04-01
9 2005-04-01
10 2010-09-17
11 2010-03-01
它只是第一個值,或者你是否需要更大的數據集的更一般的解決方案? –
很多關於將2位數年份轉換爲4位數年份的有用信息:http://stackoverflow.com/questions/9508747/add-correct-century-to-dates-with-year-provided-as-year-without- century-y – jalapic
如果你單獨解析它,它會正常工作;只是格式的多樣性拉長了'parse_date_time'的格式猜測太寬。假設它不是一個巨大的向量,只是循環它,它會正常工作:'do.call(c,lapply(data $ initialDiagnose,lubridate :: parse_date_time,orders = c('mdy','dmy')) )' – alistaire