在Selenium中返回源代碼之前等待（不是超時（））

我正在嘗試網絡爬蟲this website。正如您所看到的，打開時，它會首先顯示第一個錯誤的頁面幾秒鐘，然後加載實際的，我感興趣的右頁。在Selenium中返回源代碼之前等待（不是超時（））

爲了清晰起見。 First/wrong page和second, right page

正如預期的那樣，使用BeautifulSoup或Requests我只弄到了「第一頁」的HTML，而不是「正確」的頁面。

我試過使用Selenium和set_page_load_timeout()，它只返回'首頁/錯誤'頁面而不是實際頁面。

driver = webdriver.Chrome() 
driver.set_page_load_timeout(7) 
url = 'https://images.nga.gov/en/search/do_quick_search.html?q=%221949.7.1%22' 
driver.get(url) 
source = BeautifulSoup(driver.page_source, 'html.parser') 
print(source)

我試圖尋找相關的問題，但是他們都關於設置超時，這似乎並不成爲問題來了，由於網頁加載，它只是不是我想要的頁面。

有沒有辦法在7秒後得到source？

來源

2017-06-06 Mitchell van Zuylen

您可以使用title_is()expected condition所需的頁面打開時要等待一個特殊時刻（即獲得源，而不必在7秒之後超時前等7秒），（網頁標題改爲從"Just a moment..."到"National Gallery of Art | NGA Images"）：

from selenium.webdriver.support import expected_conditions as EC 
from selenium.webdriver.support.ui import WebDriverWait as wait 


driver = webdriver.Chrome() 
url = 'https://images.nga.gov/en/search/do_quick_search.html?q=%221949.7.1%22' 
title = "National Gallery of Art | NGA Images" 
driver.get(url) 
wait(driver, 10).until(EC.title_is(title)) 
source = BeautifulSoup(driver.page_source, 'html.parser') 
print(source)

來源

2017-06-06 19:49:26 Andersson

這裏wait（）中的10是超時嗎？即如果EC在x秒之後是真的，只要x <10，它就會持續下去。 –

是的，10秒超時。如果標題在10秒後保持不變，您將得到'TimeOutException' – Andersson

在Selenium中返回源代碼之前等待（不是超時（））

回答

相關問題