使用已打開的網頁（使用硒）來美化？

我有一個網頁打開並使用webdriver代碼登錄。使用webdriver爲此，因爲該頁面需要登錄和各種其他行動之前我設置刮。使用已打開的網頁（使用硒）來美化？

目標是從這個打開的頁面中抓取數據。需要找到鏈接並打開它們，因此selenium webdriver和BeautifulSoup之間會有很多組合。

我看着爲BS4文檔和BeautifulSoup(open("ccc.html"))引發錯誤

soup = bs4.BeautifulSoup(open("https://m/search.mp?ss=Pr+Dn+Ts"))

OSError: [Errno 22] Invalid argument: ' https://m/search.mp?ss=Pr+Dn+Ts '

我想這是因爲它不是一個.html？

2017-01-23 Sid

參見[如何讓整個頁面的innerHTML的硒驅動程序（ https://stackoverflow.com/questions/35905517/how-to-get-innerhtml-of-whole-page-in-selenium-driver） – robyschek

您正試圖通過網址打開頁面。 open()不會那麼做的，使用urlopen()：

from urllib.request import urlopen # Python 3 
# from urllib2 import urlopen # Python 2 

url = "your target url here" 
soup = bs4.BeautifulSoup(urlopen(url), "html.parser")

或者使用對人類的HTTP - requests library：

import requests 

response = requests.get(url) 
soup = bs4.BeautifulSoup(response.content, "html.parser")

還要注意，強烈建議specify a parser explicitly - 我在這個使用html.parser情況下，還有其他解析器可用。

I want to use the exact same page(same instance)

一種常見的方式做到這一點是讓driver.page_source並將其傳遞給BeautifulSoup進一步解析：

from bs4 import BeautifulSoup 
from selenium import webdriver 

driver = webdriver.Firefox() 
driver.get(url) 

# wait for page to load.. 

source = driver.page_source 
driver.quit() # remove this line to leave the browser open 

soup = BeautifulSoup(source, "html.parser")

來源

2017-01-23 17:17:38 alecxe

我想我沒有正確解釋，頁面已經打開。 :(我想使用由selenium打開的完全相同的頁面（相同的實例）。在這兩個例子中，我假設一個新的基於URL的請求正在打開/獲取數據。 – Sid

@Sid好吧，我已經更新了回答 - 請看這是否是你的意思。謝謝。 – alecxe

第三個正是我在找的。:)謝謝 – Sid

使用已打開的網頁（使用硒）來美化？

回答

相關問題