WebScraping - Python - 在html中找不到鏈接

我試圖從https://www.udemy.com/courses/search/?q=sql&src=ukw&lang=en中刪除所有鏈接，但是即使沒有選擇一個元素，我的代碼也沒有檢索到鏈接。請參閱下面的代碼。WebScraping - Python - 在html中找不到鏈接

import bs4,requests as rq 
Link = 'https://www.udemy.com/courses/search/?q=sql&src=ukw&lang=en' 
RQOBJ = rq.get(Link) 
BS4OBJ = bs4.BeautifulSoup(RQOBJ.text) 
print(BS4OBJ)

來源

2016-11-19 Krishn

您希望獲取的鏈接是什麼？任何例子，我嘗試你的代碼，它給了我很多的HTML字符串回來，它似乎是好的 – linpingta

道歉，我不是很清楚。我喜歡課程鏈接，所以... https：//www.udemy .com/the-complete-sql-bootcamp/etc – Krishn

是的，我看到...不幸的是，該網站是動態生成的，這意味着它通過使用ajax生成部分html內容禁止簡單的網絡抓取。你可以嘗試CasperJs，它模擬用戶訪問而不是爬蟲，我有一個Facebook粉絲頁面的例子，https://github.com/linpingta/facebook-related/tree/master/facebook-fan-page-fetcher – linpingta

網站使用JavaScript來獲取數據，你應該使用硒

來源

2016-11-19 10:21:28

希望你想要的頁面上課程的鏈接，該代碼將有助於

from selenium import webdriver 
from bs4 import BeautifulSoup 
import time 

baseurl='https://www.udemy.com' 
url="https://www.udemy.com/courses/search/?q=sql&src=ukw&lang=en" 
driver = webdriver.Chrome() 
driver.maximize_window() 
driver.get(url) 

time.sleep(5) 
content = driver.page_source.encode('utf-8').strip() 
soup = BeautifulSoup(content,"html.parser") 
courseLink = soup.findAll("a", {"class": "card__title",'href': True}) 

for link in courseLink: 
    print baseurl+link['href'] 

driver.quit()

它會打印：

https://www.udemy.com/the-complete-sql-bootcamp/ 
https://www.udemy.com/the-complete-oracle-sql-certification-course/ 
https://www.udemy.com/introduction-to-sql23/ 
https://www.udemy.com/oracle-sql-12c-become-an-sql-developer-with-subtitle/ 
https://www.udemy.com/sql-advanced/ 
https://www.udemy.com/sql-for-newbs/ 
https://www.udemy.com/sql-for-marketers-data-analytics-data-science-big-data/ 
https://www.udemy.com/sql-for-punk-analytics/ 
https://www.udemy.com/sql-basics-for-beginners/ 
https://www.udemy.com/oracle-sql-step-by-step-approach/ 
https://www.udemy.com/microsoft-sql-for-beginners/ 
https://www.udemy.com/sql-tutorial-learn-sql-with-mysql-database-beginner2expert/

來源

2016-11-19 12:27:50 thebadguy

WebScraping - Python - 在html中找不到鏈接

回答

相關問題