您可以通過右鍵單擊網絡部分中的xhr請求來嘗試執行copy as Curl
命令。
這是我得到
curl "https://ssl.bbc.co.uk/modules/comments/ajax/comments/?siteId=newscommentsmodule^&forumId=__CPS__37750489^&filter=none^&sortOrder=Descending^&sortBy=Created^&mock=0^&mockUser=^&parentUri=http^%^3A^%^2F^%^2Fwww.bbc.com^%^2Fnews^%^2Feducation-37750489^%^2Fcomments^%^3Fcomments_page^%^3D1^%^26initial_page_size^%^3D10^%^26filter^%^3Dnone^%^26sortBy^%^3DCreated^%^26sortOrder^%^3DDescending^&loc=en-GB^&preset=responsive^&initial_page_size=10^&transTags=0^&comments_page=4" -H "Origin: http://www.bbc.com" -H "Accept-Encoding: gzip, deflate, sdch, br" -H "Accept-Language: en-US,en;q=0.8" -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.44 Safari/537.36" -H "Accept: application/json, text/javascript, */*; q=0.01" -H "Referer: http://www.bbc.com/news/education-37750489/comments?comments_page=1^&initial_page_size=10^&filter=none^&sortBy=Created^&sortOrder=Descending" -H "Cookie: BBC-UID=f5b82387c6a5f59cfb1e4e702165956d3c4a59fc40f0a0cc2289331f59e23c7f0Mozilla^%^2f5^%^2e0^%^20^%^28Windows^%^20NT^%^2010^%^2e0^%^3b^%^20Win64^%^3b^%^20x64^%^29^%^20AppleWebKit^%^2f537^%^2e36^%^20^%^28KHTML^%^2c^%^20like^%^20Gecko^%^29^%^20Chrome^%^2f55^%^2e0^%^2e2883^%^2e44^%^20Safari^%^2f537^%^2e36; BGUID=e50803c74685961b76a3bae761e263da9bbf269019d8d4abbed18707f72c1098; s1=208.5.385837657400859000FDD0E52985" -H "Connection: keep-alive" --compressed
或者您可以使用硒點擊分頁按鈕來直接
driver.find_element_by_css_selector('li.comments-pagination-page.comments-pagination-page-{} a'.format(pageNumber)).click()
這裏li.comments-pagination-page.comments-pagination-page-3
是在頁面第3分頁按鈕LI標記。
「我想過使用Selenium/Dryscape,但我不知道如何才能進入每個頁面來運行它們。」你能解釋一下嗎?我不明白這個問題。 –
我需要找到一種方法來獲取每個評論頁面的URL以運行刮擦。我正在考慮運行一個像Selenium或Dryscrape這樣的屏幕抓取工具,但我仍然需要獲得一個URL來訪問每個頁面,對嗎? – cstaff91
分享您迄今試過的代碼 – thebadguy