您正試圖通過網址打開頁面。 open()
不會那麼做的,使用urlopen()
:
from urllib.request import urlopen # Python 3
# from urllib2 import urlopen # Python 2
url = "your target url here"
soup = bs4.BeautifulSoup(urlopen(url), "html.parser")
或者使用對人類的HTTP - requests
library:
import requests
response = requests.get(url)
soup = bs4.BeautifulSoup(response.content, "html.parser")
還要注意,強烈建議specify a parser explicitly - 我在這個使用html.parser
情況下,還有其他解析器可用。
I want to use the exact same page(same instance)
一種常見的方式做到這一點是讓driver.page_source
並將其傳遞給BeautifulSoup
進一步解析:
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Firefox()
driver.get(url)
# wait for page to load..
source = driver.page_source
driver.quit() # remove this line to leave the browser open
soup = BeautifulSoup(source, "html.parser")
參見[如何讓整個頁面的innerHTML的硒驅動程序( https://stackoverflow.com/questions/35905517/how-to-get-innerhtml-of-whole-page-in-selenium-driver) – robyschek