2016-04-03 64 views
0

你好使用本網頁http://www.uefa.com/uefachampionsleague/season=2016/statistics/round=2000634/players/index.htmlFor循環不隨走所有迭代,同時沿RSelenium

我使用RSelenium單擊所有球員的名字這是鏈接的嘗試,刮球員個人網頁回去繼續與其他玩家

# packages 
library(RSelenium) 
library(XML) 


# navigation to the site 
    remDr <- remoteDriver$new() 
    remDr$open() 
    remDr$navigate("http://www.uefa.com/uefachampionsleague/season=2016/statistics/round=2000634/players/index.html") 

# this will find all needed links 
    player<-remDr$findElements(using = 'xpath',value = "//span/a") 

# this confirms that there are 20 links 
    length(player) 


# this is loop which is supposed to click go to all 20 pages scrape some info and proceed 
for (i in 1:20) { 

    player<-remDr$findElements(using = 'xpath',value = "//span/a") 
    player[[i]]$clickElement() 
    Sys.sleep(5) 
    urlplayer<-remDr$getCurrentUrl() 
    urlplayer2<-htmlParse(urlplayer) 
    hraci<-xpathSApply(urlplayer2,path = "//ul[@class='innerText']/li",fun = xmlValue) 
    print(hraci) 
    remDr$goBack() 
} 

我運行此代碼幾次,但總是在一些迭代後得到錯誤Error in player[[i]] : subscript out of bounds

如果我在最後一次嘗試中查找迭代器的值,它是7,有時候是12和其他數字。

我不知道爲什麼我得到這個錯誤,因此可以感謝任何人的幫助!

回答

0

我提出一個不同的方法,它不需要硒:

library(XML) 
doc <- htmlParse("http://www.uefa.com/statistics/uefachampionsleague/season=2016/statistics/round=2000634/players/_loadRemaining.html", encoding = "UTF-8") 
n <- 3 
hrefs <- head(xpathSApply(doc, "//tr/td[1]/span/a", xmlGetAttr, "href"), n) 
players <- head(xpathSApply(doc, "//tr/td[1]/span/a", xmlValue), n) 
for (x in seq(hrefs)) 
    download.file(paste0("http://www.uefa.com", hrefs[x]), file.path(tempdir(), paste0(players[x], ".html"))) 

x <- 1 
readHTMLTable(file.path(tempdir(), paste0(players[x], ".html"))) 
+0

其實同時我下載所有網頁usign XML,但我的人物就亂了。我看到你添加了參數編碼=「UTF-8」。我嘗試從loadremaining.html獲取玩家的名字,現在它是正確的。非常感謝你 –