我被一個網站封鎖了一些缺失的參數嗎？（用硒刮）

我試圖抓取的頁面（練習）是以下網址。我試圖在我被一個網站封鎖了一些缺失的參數嗎？（用硒刮）

import time 

from PIL import Image 
import time 
from selenium import webdriver 
from selenium.common.exceptions import NoSuchElementException 
browser.quit() 
browser = webdriver.PhantomJS() 
browser.implicitly_wait(12) 
url = 'https://seekingalpha.com/symbol/OPK/financials/income-statement' 



browser.get(url) 
time.sleep(9) 
#x =browser.find_element_by_class_name('content') 
y =browser.find_element_by_xpath("//*[@id='industrial-income-statement']")

該代碼是工作只是一小會兒前頁面底部刮收入聲明（圖）現在我的「M得到一個‘沒有這樣的元素’的錯誤，此行y =browser.find_element_by_xpath("//*[@id='industrial-income-statement']")

如果鍵入browser.page_source：

也有一些是訪問被拒絕，但我不知道是什麼原因。我只是想湊一個圖表，我使用硒，我認爲有相應的頭文件。

'0px 25px; padding: 0px; resize: none; "></textarea></div></div></div>\n <p>\n Access to this page has been denied because we believe you are using automation tools to browse the website.\n </p>\n <p>\n This may happen as a result of the following:\n </p>\n <ul>\n <li>\n Javascript is disabled or blocked by an extension (ad blockers for example)\n </li>\n <li>\n Your browser does not support cookies\n </li>\n </ul>\n <p>\n Please make sure that Javascript and cookies are enabled on your browser and that you are not blocking them from loading.\n </p>\n <p>\n Reference ID: #a2a7fe90-4a2a-11e7-be16-a994e7f2d3b8\n </p>\n </div>\n </div>\n <div class="page-footer-wrapper">\n <div class="page-foote

PhantomJS不會阻止JavaScript也不塊餅乾，據我所知。

有沒有辦法解決這個問題？

來源

2017-06-05 Moondra

您應該假裝不被PhantomJS，以避免被檢測：

capabilities = dict(webdriver.DesiredCapabilities.PHANTOMJS) 
capabilities["phantomjs.page.settings.userAgent"] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.57 Safari/537.36" 

browser = webdriver.PhantomJS(desired_capabilities=capabilities)

我雖然小心刮這個資源沒有明確的同意 - 查看Terms of Use - 「用戶行爲」一節。

來源

2017-06-05 20:39:48 alecxe

謝謝=）。我通常會與誰聯繫以獲得同意？我其實只是想刮這一頁來練習目的。（由於Javascript（？）的實現方式而帶來的諸多挑戰）。 – Moondra

@moondra，他們有多個選項「聯繫我們」部分。但是，由於您只是爲單個頁面執行操作，所以我認爲您不應該這麼做，除非您正在進行沉重的網頁抓取或以非法方式使用提取的數據:) – alecxe

我被一個網站封鎖了一些缺失的參數嗎？（用硒刮）

回答

相關問題