0
我嘗試解析頁ozon.ru硒:滾動頁面來與蟒蛇
解析和我有一些問題。 我應該滾動頁面,然後獲取所有html
代碼。 但我滾動頁面,高度正在改變,但解析的結果是錯誤的,因爲它只返回第一頁的結果。 我不明白,我應該更新頁面的html代碼,我該怎麼做?
def get_link_product_ozon(url):
chromedriver = "chromedriver"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chromedriver)
driver.get(url)
i = 0
last_height = driver.execute_script("return document.body.scrollHeight")
while i < 80:
try:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(3)
new_height = driver.execute_script("return document.body.scrollHeight")
i += 1
last_height = new_height
except:
time.sleep(3)
continue
soup = BeautifulSoup(driver.page_source, "lxml")
all_links = soup.findAll('div', class_='bOneTile inline jsUpdateLink mRuble ')
for link in all_links:
print(link.attrs['data-href'])
driver.close()