在Scrapy中禁用SSL證書驗證

我目前正在用Scrapy處理一個問題。每當我使用Scrapy刮取證書的CN值與服務器的域名相匹配的HTTPS站點時，Scrapy效果很好！在另一方面，雖然，每當我試圖刮一個網站，該證書的CN值不匹配服務器的域名，我得到如下：在Scrapy中禁用SSL證書驗證

Traceback (most recent call last): 
    File "/usr/local/lib/python2.7/dist-packages/twisted/protocols/tls.py", line 415, in dataReceived 
    self._write(bytes) 
    File "/usr/local/lib/python2.7/dist-packages/twisted/protocols/tls.py", line 554, in _write 
    sent = self._tlsConnection.send(toSend) 
    File "/usr/local/lib/python2.7/dist-packages/OpenSSL/SSL.py", line 1270, in send 
    result = _lib.SSL_write(self._ssl, buf, len(buf)) 
    File "/usr/local/lib/python2.7/dist-packages/OpenSSL/SSL.py", line 926, in wrapper 
    callback(Connection._reverse_mapping[ssl], where, return_code) 
--- <exception caught here> --- 
    File "/usr/local/lib/python2.7/dist-packages/twisted/internet/_sslverify.py", line 1055, in infoCallback 
    return wrapped(connection, where, ret) 
    File "/usr/local/lib/python2.7/dist-packages/twisted/internet/_sslverify.py", line 1154, in _identityVerifyingInfoCallback 
    verifyHostname(connection, self._hostnameASCII) 
    File "/usr/local/lib/python2.7/dist-packages/service_identity/pyopenssl.py", line 30, in verify_hostname 
    obligatory_ids=[DNS_ID(hostname)], 
    File "/usr/local/lib/python2.7/dist-packages/service_identity/_common.py", line 235, in __init__ 
    raise ValueError("Invalid DNS-ID.") 
exceptions.ValueError: Invalid DNS-ID.

我已經通過儘可能多的資料看，我可以和據我所知，Scrapy沒有辦法禁用SSL證書驗證。即使對於Scrapy Request對象（我會以爲是哪裏此功能會說謊）的文件有沒有參考：

http://doc.scrapy.org/en/1.0/topics/request-response.html#scrapy.http.Request https://github.com/scrapy/scrapy/blob/master/scrapy/http/request/init.py

也有其解決問題沒有Scrapy設置：

http://doc.scrapy.org/en/1.0/topics/settings.html

根據需要使用Scrapy並根據需要修改源的缺點，有沒有人有任何想法可以禁用SSL證書驗證？

謝謝！

來源

2015-10-05 MoarCodePlz

從文檔中查看我可以修改「DOWNLOAD_HANDLERS」或「DOWNLOAD_HANDLERS_BASE」設置以更改scrapy處理https的方式。從那裏你可能不得不創建你自己修改的'HttpDownloadHandler'，它可以通過你收到的錯誤。 – Monkpit

/我在桌子上胡思亂想。這當然看起來很有希望。你可以把它寫成答案，以便我可以接受，然後添加我用於其他人的代碼以供將來參考？ – MoarCodePlz

從您鏈接到the settings的文檔看來，您似乎可以修改DOWNLOAD_HANDLERS設置。

從文檔：

""" 
    A dict containing the request download handlers enabled by default in 
    Scrapy. You should never modify this setting in your project, modify 
    DOWNLOAD_HANDLERS instead. 
""" 

DOWNLOAD_HANDLERS_BASE = { 
    'file': 'scrapy.core.downloader.handlers.file.FileDownloadHandler', 
    'http': 'scrapy.core.downloader.handlers.http.HttpDownloadHandler', 
    'https': 'scrapy.core.downloader.handlers.http.HttpDownloadHandler', 
    's3': 'scrapy.core.downloader.handlers.s3.S3DownloadHandler', 
}

然後在你的設置，像這樣：

""" 
    Configure your download handlers with something custom to override 
    the default https handler 
""" 
DOWNLOAD_HANDLERS = { 
    'https': 'my.custom.downloader.handler.https.HttpsDownloaderIgnoreCNError', 
}

所以通過定義https協議自定義處理程序，你應該能夠處理錯誤你得到並允許scrapy繼續其業務。

來源

2015-10-05 14:33:14 Monkpit

這真是太棒了，並且看起來特別針對我遇到的問題。我將要使用代碼來看看我是否能夠實現這個目標，並在這裏發佈我的解決方案！謝謝！ – MoarCodePlz

@MoarCodePlz你有沒有找到解決方案？發佈一些鏈接有趣嗎？ – Dawson

在Scrapy中禁用SSL證書驗證

回答

相關問題