如何使用循環來抓取R中多個網頁的網站數據？

我想應用一個循環來從R中的多個網頁中抓取數據。我能夠抓取一個網頁的數據，但是當我嘗試爲多個頁面使用一個循環時，我得到一個令人沮喪的錯誤。我花了數小時修補，無濟於事。任何幫助將不勝感激！！！如何使用循環來抓取R中多個網頁的網站數據？

這工作：

########################### 
# GET COUNTRY DATA 
########################### 

library("rvest") 

site <- paste("http://www.countryreports.org/country/","Norway",".htm", sep="") 
site <- html(site) 

stats<- 
    data.frame(names =site %>% html_nodes(xpath="//*/td[1]") %>% html_text() , 
     facts =site %>% html_nodes(xpath="//*/td[2]") %>% html_text() , 
     stringsAsFactors=FALSE) 

stats$country <- "Norway" 
stats$names <- gsub('[\r\n\t]', '', stats$names) 
stats$facts <- gsub('[\r\n\t]', '', stats$facts) 
View(stats)

然而，當我試圖在一個循環來寫這篇文章，我收到一條錯誤

########################### 
# ATTEMPT IN A LOOP 
########################### 

country<-c("Norway","Sweden","Finland","France","Greece","Italy","Spain") 

for(i in country){ 

site <- paste("http://www.countryreports.org/country/",country,".htm", sep="") 
site <- html(site) 

stats<- 
data.frame(names =site %>% html_nodes(xpath="//*/td[1]") %>% html_text() , 
     facts =site %>% html_nodes(xpath="//*/td[2]") %>% html_text() , 
     stringsAsFactors=FALSE) 

stats$country <- country 
stats$names <- gsub('[\r\n\t]', '', stats$names) 
stats$facts <- gsub('[\r\n\t]', '', stats$facts) 

stats<-rbind(stats,stats) 
stats<-stats[!duplicated(stats),] 
}

錯誤：

Error: length(url) == 1 is not TRUE 
In addition: Warning message: 
In if (grepl("^http", x)) { : 
    the condition has length > 1 and only the first element will be used

來源

2015-01-08 Chris L

相同的結果在這裏。我試過這段代碼，即使在非循環工作時也得到相同的錯誤信息！ >長度（站點） [1] 7 > stopifnot（長度（站點）== 1）錯誤：長度（站點）== 1不是TRUE – lawyeR

在此行上：'site < - paste（「http：/ /www.countryreports.org/country/",country,".htm「，sep =」「）'您正在使用'country'，即在循環版本中，與您所有國家/地區的字符向量。你可能想要'i'這是你的國家媒介的一個元素。 – zelite

zelite - 讓我更加接近 - 謝謝。 –

最後工作的代碼：

########################### 
# THIS WORKS!!!! 
########################### 

country<-c("Norway","Sweden","Finland","France","Greece","Italy","Spain") 

for(i in country){ 

site <- paste("http://www.countryreports.org/country/",i,".htm", sep="") 
site <- html(site) 

stats<- 
data.frame(names =site %>% html_nodes(xpath="//*/td[1]") %>% html_text() , 
    facts =site %>% html_nodes(xpath="//*/td[2]") %>% html_text() , 
     stringsAsFactors=FALSE) 

stats$nm <- i 
stats$names <- gsub('[\r\n\t]', '', stats$names) 
stats$facts <- gsub('[\r\n\t]', '', stats$facts) 
#stats<-stats[!duplicated(stats),] 
all<-rbind(all,stats) 

} 
View(all)

來源

2015-01-09 03:15:46

這真的對你有用嗎？爲了做類似的事情，所以運行你的代碼並收到以下錯誤：rep（xi，length.out = nvar）中的錯誤：試圖複製'builtin'類型的對象。你之前在某個地方發起過「全部」嗎？ –

這就是我所做的。這不是最好的解決方案，但你會得到一個輸出。這也只是一個解決方法。我不建議您在運行循環時將表輸出寫入文件。幹得好。輸出從stats生成後，

output<-rbind(stats,i)

然後寫表，

write.table(output, file = "D:\\Documents\\HTML\\Test of loop.csv", row.names = FALSE, append = TRUE, sep = ",") 

#then close the loop 
}

好運

來源

2016-09-20 12:58:59

就initalize循環之前的空數據幀。我已經做了這個問題，下面的代碼適合我。

country<-c("Norway","Sweden","Finland","France","Greece","Italy","Spain") 
df <- data.frame(names = character(0),facts = character(0),nm = character(0)) 

for(i in country){ 

    site <- paste("http://www.countryreports.org/country/",i,".htm", sep="") 
    site <- html(site) 

    stats<- 
    data.frame(names =site %>% html_nodes(xpath="//*/td[1]") %>% html_text() , 
       facts =site %>% html_nodes(xpath="//*/td[2]") %>% html_text() , 
       stringsAsFactors=FALSE) 

    stats$nm <- i 
    stats$names <- gsub('[\r\n\t]', '', stats$names) 
    stats$facts <- gsub('[\r\n\t]', '', stats$facts) 
    #stats<-stats[!duplicated(stats),] 
    #all<-rbind(all,stats) 
    df <- rbind(df, stats) 
    #all <- merge(Output,stats) 

} 
View(df)

來源

2018-01-08 05:44:18 Premal

如何使用循環來抓取R中多個網頁的網站數據？

回答

相關問題