如何從日記中提取評論期的信息使用R

我想對期刊的科學報告,http://www.nature.com/srep/articles的評論期進行調查。我想在一個時間窗口（或最近的100篇文章）中提取每篇文章的提交時間和接受時間。有什麼建議如何在R中做到這一點？解決方案可以很簡單，但我以前從未使用R進行網頁抓取。一些提示可能相當有幫助。如何從日記中提取評論期的信息使用R

來源

2016-07-06 yliueagle

這裏就是你可以嘗試

編譯你的鏈接在一個CSV文件，因爲我在鏈接看到的唯一變化是srepID末，做到這一點，如下圖所示：

> head(links) 
            links 
1 http://www.nature.com/articles/srep20000 
2 http://www.nature.com/articles/srep20001 
3 http://www.nature.com/articles/srep20002 
4 http://www.nature.com/articles/srep20003 
5 http://www.nature.com/articles/srep20004 
6 http://www.nature.com/articles/srep20005

然後運行下面的代碼：

library(rvest) 
links <- read.csv("link.csv",T,"~") 



for (i in 1:nrow(links)) { 

url <- read_html(as.character(links[i,1])) 

#Upload 

links[i,2] <- url %>% 
     html_node("dd:nth-child(2) time") %>% 
     html_text() %>% 
     as.character() 

#Accepted 

links[i,3] <- url %>% 
    html_node("dd:nth-child(4) time") %>% 
    html_text() %>% 
    as.character() 



} 

colnames(links)[2] <- "Received" 
colnames(links)[3] <- "Accepted"

你會得到的結果爲：

> head(links) 
            links   Received   Accepted 
1 http://www.nature.com/articles/srep20000 15 October 2015 22 December 2015 
2 http://www.nature.com/articles/srep20001 21 October 2015 22 December 2015 
3 http://www.nature.com/articles/srep20002 20 October 2015 22 December 2015 
4 http://www.nature.com/articles/srep20003 10 November 2015 22 December 2015 
5 http://www.nature.com/articles/srep20004 15 November 2015 22 December 2015 
6 http://www.nature.com/articles/srep20005 09 November 2015 22 December 2015

注意：URL的數量越多，代碼的完成時間就越長。此外，該網站不允許在其網頁上使用botic動作，因此無法使用任何替代方式向您提供所有信息。

來源

2016-07-06 13:16:23

工作正常。謝謝 – yliueagle

如何從日記中提取評論期的信息使用R

回答

相關問題