2012-12-05 56 views
9

當我嘗試使用以下格式解析時間戳時:「Thu Nov 8 15:41:45 2012」,只返回NAstrptime,as.POSixct和as.Date返回意外的不適用

我使用的是Mac OS X R 2.15.2和Rstudio 0.97.237。我的操作系統的語言是荷蘭語:我認爲這與它有關。

當我嘗試strptimeNA返回:

var <- "Thu Nov 8 15:41:45 2012" 
strptime(var, "%a %b %d %H:%M:%S %Y") 
# [1] NA 

也不對as.POSIXct工作:

as.Date("Thu Nov 8 2012", "%a %b %d %Y") 
# [1] NA 

as.POSIXct(var, "%a %b %d %H:%M:%S %Y") 
# [1] NA 

我還對以上但沒有%H:%M:%S組件串試圖as.Date

任何想法我可能做錯了什麼?

+1

我無法在Ubuntu和R上重現您的錯誤。另外,對我來說,'strptime'創建一個'POSIXlt'而不是'POSIXct'時間對象。最後,請嘗試使用'as.POSIXct(var,format = ...)'來看看你是否有更多的運氣。 – Justin

回答

17

我認爲這完全是你猜對的,strptime由於你的語言環境不能解析你的日期時間字符串。您的字符串包含縮寫星期幾(%a)和縮寫月份名稱(%b)。這些時間規範?strptime描述:

Details

%a : Abbreviated weekday name in the current locale on this platform

%b : Abbreviated month name in the current locale on this platform.

"Note that abbreviated names are platform-specific (although the standards specify that in the C locale they must be the first three letters of the capitalized English name:"

"Knowing what the abbreviations are is essential if you wish to use %a , %b or %h as part of an input format: see the examples for how to check."

See also

[...] locales to query or set a locale.

locales問題也是相關as.POSIXctas.POSIXltas.Date

?as.POSIXct

Details

If format is specified, remember that some of the format specifications are locale-specific, and you may need to set the LC_TIME category appropriately via Sys.setlocale . This most often affects the use of %b , %B (month names) and %p (AM/PM).

?as.Date

Details

Locale-specific conversions to and from character strings are used where appropriate and available. This affects the names of the days and months.


因此,如果平日月份名字符串中那些在當前區域設置,strptimeas.POSIXctas.Date不同無法正確解析字符串並返回NA

但是,您可以通過改變locales解決這個問題:

# First save your current locale 
loc <- Sys.getlocale("LC_TIME") 

# Set correct locale for the strings to be parsed 
# (in this particular case: English) 
# so that weekdays (e.g "Thu") and abbreviated month (e.g "Nov") are recognized 
Sys.setlocale("LC_TIME", "en_GB.UTF-8") 
# or 
Sys.setlocale("LC_TIME", "C") 

#Then proceed as you intended 
x <- "Thu Nov 8 15:41:45 2012" 
strptime(x, "%a %b %d %H:%M:%S %Y") 
# [1] "2012-11-08 15:41:45" 

# Then set back to your old locale 
Sys.setlocale("LC_TIME", loc) 

我個人的語言環境,我可以重現你的錯誤:

Sys.setlocale("LC_TIME", loc) 
# [1] "fr_FR.UTF-8" 

strptime(var,"%a %b %d %H:%M:%S %Y") 
# [1] NA 
0

只是有同樣的問題瞎搞,結果發現此解決方案要更清潔,因爲不需要手動更改任何系統設置,因爲在lubridate包中有一個包裝函數執行此作業,並且您只需設置參數locale

date <- c("23. juni 2014", "1. november 2014", "8. marts 2014", "16. juni 2014", "12. december 2014", "13. august 2014") 
df$date <- dmy(df$Date, locale = "Danish") 
[1] "2014-06-23" "2014-11-01" "2014-03-08" "2014-06-16" "2014-12-12" "2014-08-13" 
+2

關於「_不需要更改任何系統設置_」,請注意''lubridate'函數中的'locale'參數只是上述答案中概括步驟的便利包裝:(1)保存當前語言環境, (2)改變語言環境,(3)恢復原來的語言環境。檢查代碼[here](https://github.com/hadley/lubridate/blob/master/R/parse.r):'orig_locale < - Sys.getlocale(「LC_TIME」); Sys.setlocale(「LC_TIME」,locale); on.exit(Sys.setlocale(「LC_TIME」,orig_locale))' – Henrik