2016-01-10 86 views
0

我在R工作組下面的代碼來湊網頁信息的列表:追加在R裏面,列表數據幀中單列

library(rvest) 
crickbuzz <- read_html(httr::GET("http://www.cricbuzz.com/cricket -match/live-scores")) 
matches_dates <- crickbuzz %>% 
html_nodes(".schedule-date:nth-child(1)")%>% 
html_attr("timestamp") 

matches_dates 
[1] "1452268800000" "1452132000000" "1452247200000" "1452242400000" "1452327000000" "1452290400000" "1452310200000" "1452310200000" "1452310200000" 
[10] "1452310200000" "1452324600000" "1452324600000" "1452324600000" "1452324600000" "1452324600000" "1452150000000" "1452153600000" "1452153600000" 

現在我將其轉換爲正確的日期和時間格式

dates <- lapply(X = matches_date , function(timestamp_match){ 
(as.POSIXct(as.numeric(timestamp_match)/1000, origin="1970-01-01")) }) 

和現在我有以下表格日期:

 dates 
[[1]] 
[1] "2016-01-10 07:30:00 IST" 

[[2]] 
[1] "2016-01-10 21:30:00 IST" 

[[3]] 
[1] "2016-01-09 12:00:00 IST" 

[[4]] 
[1] "2016-01-10 13:55:00 IST" 

[[5]] 
[1] "2016-01-10 10:50:00 IST" 

[[6]] 
[1] "2016-01-07 12:30:00 IST" 

[[7]] 
[1] "2016-01-07 13:30:00 IST" 

[[8]] 
[1] "2016-01-10 09:00:00 IST" 

[[9]] 
[1] "2016-01-10 09:00:00 IST" 

[[10]] 
[1] "2016-01-10 09:00:00 IST" 

[[11]] 
[1] "2016-01-10 09:00:00 IST" 

[[12]] 
[1] "2016-01-10 09:00:00 IST" 

[[13]] 
[1] "2016-01-10 13:00:00 IST" 

[[14]] 
[1] "2016-01-10 13:00:00 IST" 

[[15]] 
[1] "2016-01-10 13:00:00 IST" 

[[16]] 
[1] "2016-01-10 13:00:00 IST" 

[[17]] 
[1] "2016-01-10 03:30:00 IST" 

[[18]] 
[1] "2016-01-10 03:30:00 IST" 

現在我附加到這個數據幀中的一個柱:

matches_info [,「日期和時間」] < - 日期

,但只有第一個日期是越來越複製整列並給予以下警告。

Warning message: 
In `[<-.data.frame`(`*tmp*`, , "Date And Time", value = list(1452391200, : 
provided 18 variables to replace 1 variables 

如果我會做unlist(日期)它再次給我時間戳。我怎麼能誇大日期和時間?

+0

最終,你可以做'日期< - sapply(X = matches_date,...)'而不是'lapply' – jogo

+2

這是lapply的需求嗎? 'as.POSIXct(as.numeric(matches_date)/ 1000,origin =「1970-01-01」)'是否足夠?所有涉及的功能都是矢量化的。 – nicola

回答

1

嘗試的do.call(c, dates)代替unlist(dates)防止[R從轉換列表中的元素,以數字和保持他們POSIXct:

matches_date <- c("1452268800000", "1452132000000") 
dates <- lapply(X = matches_date , function(timestamp_match){ 
(as.POSIXct(as.numeric(timestamp_match)/1000, origin="1970-01-01")) }) 
do.call(c, dates) 
# [1] "2016-01-08 17:00:00 CET" "2016-01-07 03:00:00 CET" 

matches_info[,"Date And Time"] <- do.call(c, dates) 

或者乾脆

matches_date <- c("1452268800000", "1452132000000") 
matches_info[,"Date And Time"] <- as.POSIXct(as.numeric(matches_date)/1000, origin="1970-01-01") 
+2

lapply不是必須的,因爲它裏面的所有函數都已經被矢量化 – hadley