For循環不隨走所有迭代，同時沿RSelenium

你好使用本網頁http://www.uefa.com/uefachampionsleague/season=2016/statistics/round=2000634/players/index.html For循環不隨走所有迭代，同時沿RSelenium

我使用RSelenium單擊所有球員的名字這是鏈接的嘗試，刮球員個人網頁回去繼續與其他玩家

# packages 
library(RSelenium) 
library(XML) 


# navigation to the site 
    remDr <- remoteDriver$new() 
    remDr$open() 
    remDr$navigate("http://www.uefa.com/uefachampionsleague/season=2016/statistics/round=2000634/players/index.html") 

# this will find all needed links 
    player<-remDr$findElements(using = 'xpath',value = "//span/a") 

# this confirms that there are 20 links 
    length(player) 


# this is loop which is supposed to click go to all 20 pages scrape some info and proceed 
for (i in 1:20) { 

    player<-remDr$findElements(using = 'xpath',value = "//span/a") 
    player[[i]]$clickElement() 
    Sys.sleep(5) 
    urlplayer<-remDr$getCurrentUrl() 
    urlplayer2<-htmlParse(urlplayer) 
    hraci<-xpathSApply(urlplayer2,path = "//ul[@class='innerText']/li",fun = xmlValue) 
    print(hraci) 
    remDr$goBack() 
}

我運行此代碼幾次，但總是在一些迭代後得到錯誤Error in player[[i]] : subscript out of bounds。

如果我在最後一次嘗試中查找迭代器的值，它是7，有時候是12和其他數字。

我不知道爲什麼我得到這個錯誤，因此可以感謝任何人的幫助！

來源

2016-04-03 Tomas H

我提出一個不同的方法，它不需要硒：

library(XML) 
doc <- htmlParse("http://www.uefa.com/statistics/uefachampionsleague/season=2016/statistics/round=2000634/players/_loadRemaining.html", encoding = "UTF-8") 
n <- 3 
hrefs <- head(xpathSApply(doc, "//tr/td[1]/span/a", xmlGetAttr, "href"), n) 
players <- head(xpathSApply(doc, "//tr/td[1]/span/a", xmlValue), n) 
for (x in seq(hrefs)) 
    download.file(paste0("http://www.uefa.com", hrefs[x]), file.path(tempdir(), paste0(players[x], ".html"))) 

x <- 1 
readHTMLTable(file.path(tempdir(), paste0(players[x], ".html")))

來源

2016-04-03 21:04:08 lukeA

其實同時我下載所有網頁usign XML，但我的人物就亂了。我看到你添加了參數編碼=「UTF-8」。我嘗試從loadremaining.html獲取玩家的名字，現在它是正確的。非常感謝你 –

For循環不隨走所有迭代，同時沿RSelenium

回答

相關問題