看到更多的,直到最後我需要從這個鏈接http://www.biography.com/people點擊使用硒蟒蛇
所以我用硒與蟒蛇按「看多」提取的文章,直到下載所有的人piography所以這是我的代碼
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
chrome_path = r"./chromedriver"
driver = webdriver.Chrome(chrome_path)
driver.get("http://www.biography.com/people")
while(True):
try:
element = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[contains(@class, 'm-component-footer--loader') and contains(@class, 'm-button')]")))
element.click()
except TimeoutException:
break
但問題此代碼有時工作,並爭取一次。之後給我這個例外。
Traceback (most recent call last):
File "sel.py", line 17, in <module>
element.click()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webelement.py", line 77, in click
self._execute(Command.CLICK_ELEMENT)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webelement.py", line 493, in _execute
return self._parent.execute(command, params)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 252, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 194, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Element <button class="m-component-footer--loader m-button" ng-show="properties.collection.hasMoreItems || loading" ng-click="buttonPressed()" phx-track-event="" phx-track-id="load more" ng-class="{'is-inverted': properties.background.inverted}" tabindex="0" aria-hidden="false">...</button> is not clickable at point (459, 883). Other element would receive the click: <div class="kskdDiv" style="position: fixed; overflow: hidden; height: 90px !important; z-index: 2000000; width: 728px !important;-webkit-transform-origin:0 100%;-moz-transform-origin:0 100%;-ms-transform-origin:0 100%;-o-transform-origin:0 100%;transform-origin:0 100%;left:50%;bottom:0px;-webkit-transform:scale(1) translateX(-50%);-moz-transform:scale(1) translateX(-50%);-ms-transform:scale(1) translateX(-50%);-o-transform:scale(1) translateX(-50%);transform:scale(1) translateX(-50%)" data-kiosked-role="boundary">...</div>
(Session info: chrome=58.0.3029.81)
(Driver info: chromedriver=2.29.461571 (8a88bbe0775e2a23afda0ceaf2ef7ee74e822cc5),platform=Linux 4.8.0-49-generic x86_64)
編輯1: 當我改變從20時迫不及待地9999999999999.這是下載4頁,之後拋出同樣的錯誤。
爲什麼你不能嘗試Web抓取?更多細節:http://www.pythonforbeginners.com/python-on-the-web/web-scraping-with-beautifulsoup – rcubefather