抓取網絡數據，爲rvest找到合適的選擇器（我認爲）

我試圖抓取Crossfit Games打開排行榜。我有一個在過去幾年工作過的版本，但網站改變了，我似乎無法更新我的代碼，以使其與新網站一起工作。抓取網絡數據，爲rvest找到合適的選擇器（我認爲）

我的問題是我似乎無法得到正確的CSS選擇器，以獲得運動員名稱和鏈接到他們的個人資料。

我的舊代碼做一些與此類似：

library(rvest) 

# old site 
old_url <- "https://games.crossfit.com/scores/leaderboard.php?stage=1&sort=1&page=1&division=1&region=0&numberperpage=100&competition=0&frontpage=0&expanded=0&year=16&scaled=0&full=1&showtoggles=0&hidedropdowns=1&showathleteac=1&is_mobile=1" 
old_page <- read_html(old_url) 

# get the athletes profile url 
athlete_link <- html_attr(html_nodes(old_page, "td.name a"), "href") 
athlete_name <- html_text(html_nodes(old_page, "td.name a")) 

head(athlete_link) 
# [1] "http://games.crossfit.com/athlete/124483" "http://games.crossfit.com/athlete/2725" "http://games.crossfit.com/athlete/199938" 
# [4] "http://games.crossfit.com/athlete/173837" "http://games.crossfit.com/athlete/2476" "http://games.crossfit.com/athlete/499296" 

head(athlete_name) 
# [1] "Josh Bridges" "Noah Ohlsen"  "Jacob Heppner" "Jonne Koski"  "Luke Schafer" "Andrew Kuechler" 

# new site 
new_url <- "https://games.crossfit.com/leaderboard?page=1&competition=1&year=2017&division=2&scaled=0&sort=0&fittest=1&fittest1=0&occupation=0" 
new_page <- read_html(new_url) 

# get the athletes profile url 
# I would have thought something like this would get it. 
# It doens't seem to pull anything 
html_attr(html_nodes(new_page, "td.name a.profile-link"), "href") 
# character(0) 

html_text(html_nodes(new_page, "td.name div.full-name")) 
# character(0)

我已經試過各種其它CSS Seclectors，SelectorGadget，以及一些其他的東西。我在R經驗豐富，但這是我所做過的唯一真正的網絡抓取項目，所以我可能錯過了一些非常基本的東西。

我應該使用哪個選擇器來抓取這些數據？

來源

2017-02-22 BrianDavisStats

「你可以不使用任何數據挖掘，機器人，刮或類似的數據收集或提取方法來獲取網站內容[...]」 – GGamba

看起來這個網頁的內容是用一些JavaScript動態生成的。你可以檢查頁面的來源，你會看到類似的東西：

<div class="modal-body"> 
    <!-- dynamically generated content goes here --> 
</div>

該表應該去哪裏。在這些情況下，Rvest是不夠的。您可以檢查這一點，有一些有用的指針最近的一篇博客：https://rud.is/b/2017/02/09/diving-into-dynamic-website-content-with-splashr/

來源

2017-02-22 21:28:44 sinQueso

感謝。我懷疑我需要像RSelenium這樣的東西。我會嘗試一下。 – BrianDavisStats

抓取網絡數據，爲rvest找到合適的選擇器（我認爲）

回答

相關問題