找到索引，其中as.Date失敗/不標準格式

我有，我想要挾到Date類的日期字符列：找到索引，其中as.Date失敗/不標準格式

df$x <- as.Date(df$x) 

# Error in charToDate(x) 
# character string is not in a standard unambiguous format

很好，我熟悉了這個錯誤。我的專欄中有""或90-Smarch-13"。問題是head(df$x)看起來不錯，正常日期2013-11-04，所以它不是我的專欄的全球性問題，但只有幾行的問題。

我的問題是：

我能找出這個標準明確的格式有多少行不？
我可以找到索引（以檢查它們或刪除它們）嗎？

我的想法：

使用try：

for (i in 1:nrow(df)) try(as.Date$x[i]) # very slow, doesn't finish for 1M rows

嘗試猜測什麼問題是使用nchar

有沒有更系統的方法？

來源

2013-11-03 Hugh

，你能否告訴我們'頭（DF $ X）'？當第一個非NA條目不是格式「」％Y-％m-％d「」或「％Y /％m /％d」'時，'as.Date.character'會引發錯誤。 –

'head（df $ x）'（'NA's removed）只是'「2013-11-04 00:00:00」「2013-11-04 00:00:00」「2013-11-04 00 ：00：00「」2013-11-04 00:00:00「」2013-11-04 00:00:00「」2013-11-04 00:00:00「'大部分日期都是明確的，但是找到那些不是挑戰。 – Hugh

這似乎很奇怪。 'as.Date（c（「2013-11-04 00:00:00」，「2013-12-42 1100：22」，「ddd」））'returns'「2013-11-04」NA NA' as預期。我懷疑還有其他事情正在發生，但我無法用您提供的數據複製它。 –

我會用parse_date_time從lubridate包，例如：

dates.toparse <- c("2013-11-04","" ,"90-Smarch-13","2012-05-04") 
## parse dates , I give the correct format here %Y-%m-%d 
(dates.parsed <- parse_date_time(dates.toparse,orders="Y-m-d")) 
[1] "2013-11-04 UTC" NA    NA    "2012-05-04 UTC" 
## to locate bad foarmatted elements 
dates.toparse[is.na(dates.parsed)] 
[1] ""    "90-Smarch-13" 
## or by indices 
which(is.na(dates.parsed)) 
[1] 2 3

來源

2013-11-03 23:26:01 agstudy

是的，這個工程。由於本專欄的大部分內容都是「NA」，我不得不略微修改您的代碼。（我必須刪除NA條目作爲迴應。） – Hugh

找到索引，其中as.Date失敗/不標準格式

回答

相關問題