如何使用R從一個框架內的網站刮取數據？

以下鏈接包含巴黎馬拉松賽的結果：http://www.schneiderelectricparismarathon.com/us/the-race/results/results-marathon。我想刮這些結果，但信息是在一個框架內。我知道使用Rvest和Rselenium進行刮擦的基本知識，但我對如何在這種框架內檢索數據毫無頭緒。爲了得到一個想法，我嘗試的一件事是：如何使用R從一個框架內的網站刮取數據？

url = "http://www.schneiderelectricparismarathon.com/us/the-race/results/results-marathon" 
site = read_html(url) 
ParisResults = site %>% html_node("iframe") %>% html_table() 
ParisResults = as.data.frame(ParisResults)

任何幫助解決這個問題將非常受歡迎！

來源

2016-05-23 Merijn

結果通過AJAX從以下網址下載：

url="http://www.aso.fr/massevents/resultats/ajax.php?v=1460995792&course=mar16&langue=us&version=3&action=search" 
    table <- url %>% 
    read_html(encoding="UTF-8") %>% 
    html_nodes(xpath='//table[@class="footable"]') %>% 
    html_table()

PS：我不知道什麼是AJAX正好，我才知道的rvest基礎

編輯：爲了回答評論中的問題：我沒有很多網絡抓取的經驗。如果您只使用非常基本的技術與rvest或xml，您必須瞭解更多的網站，每個網站都有自己的結構。對於這一個，這裏是我是如何做到：

正如你看到的，在源代碼中，你看不到任何結果，因爲他們是在iframe和檢查代碼的時候，你可以看到後「 2016版結果「：

class =」iframe-xdm iframe-resultats「data-href =」http://www.aso.fr/massevents/resultats/index.php?langue=us & course = mar16 &版本= 3"
現在你可以使用這個直接網址：http://www.aso.fr/massevents/resultats/index.php?langue=us&course=mar16&version=2
但你仍然可以得到結果。然後，您可以使用Chrome開發者工具>網絡> XHR。當刷新頁面時，您可以看到數據從此URL加載（當您選擇Men類別時）：http://www.aso.fr/massevents/resultats/ajax.php?course=mar16&langue=us&version=2&action=search&fields%5Bsex%5D=F&limiter=&order=
現在您可以獲得結果！
如果你想要第二頁等，你可以點擊頁面的編號，然後使用開發工具來看看會發生什麼！

來源

2016-05-23 17:59:03

謝謝，這解決了我的問題！對於未來的問題，你能告訴我你是如何設法得到這個網址的？我在源代碼中找不到它。 – Merijn

我編輯了我的答案，希望有所幫助 –

如何使用R從一個框架內的網站刮取數據？

回答

相關問題