該頁面是用JavaScript呈現的。有幾種方法來渲染和刮擦它。
我可以用硒刮擦它。 首先安裝硒:
sudo pip3 install selenium
然後拿到駕駛https://sites.google.com/a/chromium.org/chromedriver/downloads您可以使用Chrome「Chrome Canary版」的無頭版本,如果你是在Windows或Mac。
from bs4 import BeautifulSoup
from selenium import webdriver
browser = webdriver.Chrome()
url = ('https://www.takealot.com/computers/laptops-10130')
browser.get(url)
respData = browser.page_source
browser.quit()
soup = BeautifulSoup(respData, 'html.parser')
containers = soup.find_all("div", {"class": "p-data left"})
for container in containers:
print(container.text)
print(container.find("span", {"class": "amount"}).text)
或者使用PyQt5
from PyQt5.QtGui import *
from PyQt5.QtCore import *
from PyQt5.QtWebKit import *
from PyQt5.QtWebKitWidgets import QWebPage
from PyQt5.QtWidgets import QApplication
from bs4 import BeautifulSoup
import sys
class Render(QWebPage):
def __init__(self, url):
self.app = QApplication(sys.argv)
QWebPage.__init__(self)
self.loadFinished.connect(self._loadFinished)
self.mainFrame().load(QUrl(url))
self.app.exec_()
def _loadFinished(self, result):
self.frame = self.mainFrame()
self.app.quit()
url = 'https://www.takealot.com/computers/laptops-10130'
r = Render(url)
respData = r.frame.toHtml()
soup = BeautifulSoup(respData, 'html.parser')
containers = soup.find_all("div", {"class": "p-data left"})
for container in containers:
print (container.text)
print (container.find("span", {"class":"amount"}).text)
或者使用dryscrape:
from bs4 import BeautifulSoup
import dryscrape
url = 'https://www.takealot.com/computers/laptops-10130'
session = dryscrape.Session()
session.visit(url)
respData = session.body()
soup = BeautifulSoup(respData, 'html.parser')
containers = soup.find_all("div", {"class": "p-data left"})
for container in containers:
print(container.text)
print(container.find("span", {"class": "amount"}).text)
輸出在所有情況下:
Dell Inspiron 3162 Intel Celeron 11.6" Wifi Notebook (Various Colours)11.6 Inch Display; Wifi Only (Red; White & Blue Available)R 3,999R 4,999i20% OffeB 39,990Discovery Miles 39,990On Credit: R 372/monthi
3,999
HP 250 G5 Celeron N3060 Notebook - Dark ash silverNBHPW4M70EAR 4,499R 4,999ieB 44,990Discovery Miles 44,990On Credit: R 419/monthiIn StockShippingThis item is in stock in our CPT warehouse and can be shipped from there. You can also collect it yourself from our warehouse during the week or over weekends.CPT | ShippingThis item is in stock in our JHB warehouse and can be shipped from there. No collection facilities available, sorry!JHBWhen do I get it?
4,499
Asus Vivobook ...
但是,當使用您的URL進行測試時,我觀察到結果每次都無法重現,偶爾在頁面渲染後「容器」中沒有內容。
謝謝。我嘗試了它,它似乎給了我整個湯的對象。我只是嘗試了soup.prettify並完成了所有工作 - 我沒有在整個輸出中的任何地方找到對列表的引用,而且,它看起來像是在使用javascript。這讓我感到困惑 - 湯料不能包含在食物中嗎? –
如果您的網站使用異步js檢索列表並使用其結果填充頁面,那麼您的爬網程序可能不知道要等待完成。看看量角器和茉莉花這些類型的網站。 – BoboDarph