如何在讀取R行之前等待網頁加載？

我正在使用R來刮取一些網頁。其中一頁是重定向到新頁面。當我用readLines本頁面像這樣如何在讀取R行之前等待網頁加載？

test <- readLines('http://zfin.org/cgi-bin/webdriver?MIval=aa-markerselect.apg&marker_type=GENE&query_results=t&input_name=anxa5b&compare=contains&WINSIZE=25')

我得到的還是重定向頁面，而不是http://zfin.org/ZDB-GENE-030131-9076最後一頁。我想使用這個重定向頁面，因爲在它的URL中有input_name=anxa，這可以很容易地抓取不同輸入名稱的頁面。

如何獲取最終頁面的HTML？

重定向頁面：http://zfin.org/cgi-bin/webdriver?MIval=aa-markerselect.apg&marker_type=GENE&query_results=t&input_name=anxa5b&compare=contains&WINSIZE=25

最後一頁：http://zfin.org/ZDB-GENE-030131-9076

來源

2014-02-26 Niek de Klein

我不知道如何重定向之前，直到重定向，但在網頁的源代碼等等，你可以看到（在腳本標記中）一個包含重定向路徑的JavaScript函數replaceLocation：replaceLocation(\"/ZDB-GENE-030131-9076\")。

然後我建議你解析代碼並得到這個路徑。這裏是我的解決方案：

library(RCurl) 
library(XML) 

url <- "http://zfin.org/cgi-bin/webdriver?MIval=aa-markerselect.apg&marker_type=GENE&query_results=t&input_name=anxa5b&compare=contains&WINSIZE=25" 

domain <- "http://zfin.org" 

doc <- htmlParse(getURL(url, useragent='R')) 

scripts <- xpathSApply(doc, "//script", xmlValue) 

script <- scripts[which(lapply(lapply(scripts, grep, pattern = "replaceLocation\\([^url]"), length) > 0)] 

# > script 
# [1] "\n   \n\t \n\t  replaceLocation(\"/ZDB-GENE-030131-9076\")\n   \n   \n\t" 

new.url <- paste0(domain, gsub('.*\\"(.*)\\".*', '\\1', script)) 

readLines(new.url)

xpathSApply(doc, "//script", xmlValue)得到源代碼中的所有腳本。

script <- scripts[which(lapply(lapply(scripts, grep, pattern = "replaceLocation\\([^url]"), length) > 0)]獲取包含帶重定向路徑的函數的腳本。

（"replaceLocation\\([^url]"你需要排除的「URL」的原因在兩個replaceLocation功能，一個與對象的URL，另一個與評估對象（串））

而且finaly gsub('.*\\"(.*)\\".*', '\\1', script)只得到你需要在腳本中，函數的參數，路徑。

希望得到這個幫助！

來源

2014-02-26 16:09:49

如何在讀取R行之前等待網頁加載？

回答

相關問題