Lubridate Mdy功能

我想轉換以下，並沒有成功的日期之一[1]。「4/2/10」變成「0010-04-02」。Lubridate Mdy功能

有沒有辦法解決這個問題？

感謝，維韋克

data <- data.frame(initialDiagnose = c("4/2/10","14.01.2009", "9/22/2005", 
     "4/21/2010", "28.01.2010", "09.01.2009", "3/28/2005", 
     "04.01.2005", "04.01.2005", "Created on 9/17/2010", "03 01 2010")) 

mdy <- mdy(data$initialDiagnose) 
dmy <- dmy(data$initialDiagnose) 
mdy[is.na(mdy)] <- dmy[is.na(mdy)] # some dates are ambiguous, here we give 
data$initialDiagnose <- mdy  # mdy precedence over dmy 
data 

    initialDiagnose 
1  0010-04-02 
2  2009-01-14 
3  2005-09-22 
4  2010-04-21 
5  2010-01-28 
6  2009-09-01 
7  2005-03-28 
8  2005-04-01 
9  2005-04-01 
10  2010-09-17 
11  2010-03-01

來源

2016-06-09 Vivek Kumar

它只是第一個值，或者你是否需要更大的數據集的更一般的解決方案？ –

很多關於將2位數年份轉換爲4位數年份的有用信息：http://stackoverflow.com/questions/9508747/add-correct-century-to-dates-with-year-provided-as-year-without- century-y – jalapic

如果你單獨解析它，它會正常工作;只是格式的多樣性拉長了'parse_date_time'的格式猜測太寬。假設它不是一個巨大的向量，只是循環它，它會正常工作：'do.call（c，lapply（data $ initialDiagnose，lubridate :: parse_date_time，orders = c（'mdy'，'dmy'）））' – alistaire

我認爲這正在發生，因爲mdy()功能更喜歡到一年%Y（實際年）在%y（2位縮寫爲一年匹配，默認爲19XX或20XX）。

雖然有一個解決方法。我查看了lubridate::parse_date_time（?parse_date_time）的幫助文件，並且在幫助文件的底部附近有一個示例，用於添加一個參數，該參數與年份的%Y格式相比更喜歡與%y格式匹配。代碼從幫助文件中的相關位：

## ** how to use `select_formats` argument ** 
## By default %Y has precedence: 
parse_date_time(c("27-09-13", "27-09-2013"), "dmy") 
## [1] "13-09-27 UTC" "2013-09-27 UTC" 

## to give priority to %y format, define your own select_format function: 

my_select <- function(trained){ 
    n_fmts <- nchar(gsub("[^%]", "", names(trained))) + grepl("%y", names(trained))*1.5 
    names(trained[ which.max(n_fmts) ]) 
} 

parse_date_time(c("27-09-13", "27-09-2013"), "dmy", select_formats = my_select) 
## '[1] "2013-09-27 UTC" "2013-09-27 UTC"

所以，你的榜樣，你可以適應這個代碼，並替換這個mdy <- mdy(data$initialDiagnose)行：

# Define a select function that prefers %y over %Y. This is copied 
# directly from the help files 
my_select <- function(trained){ 
    n_fmts <- nchar(gsub("[^%]", "", names(trained))) + grepl("%y", names(trained))*1.5 
    names(trained[ which.max(n_fmts) ]) 
} 

# Parse as mdy dates 
mdy <- parse_date_time(data$initialDiagnose, "mdy", select_formats = my_select) 
# [1] "2010-04-02 UTC" NA    "2005-09-22 UTC" "2010-04-21 UTC" NA    
# [6] "2009-09-01 UTC" "2005-03-28 UTC" "2005-04-01 UTC" "2005-04-01 UTC" "2010-09-17 UTC" 
#[11] "2010-03-01 UTC"

和運行的代碼，其餘行從你的問題，它給了我這個數據幀作爲結果：

initialDiagnose 
1  2010-04-02 
2  2009-01-14 
3  2005-09-22 
4  2010-04-21 
5  2010-01-28 
6  2009-09-01 
7  2005-03-28 
8  2005-04-01 
9  2005-04-01 
10  2010-09-17 
11  2010-03-01

來源

2016-06-09 21:27:02 ialm

這是完美的。謝謝。 –

Lubridate Mdy功能

回答

相關問題