2017-09-18 35 views
0

我正在構建一個簡單的程序來瀏覽URL列表並使用美麗的湯提取其內容。對於分鐘我只是試圖通過列表迭代和檢索HTML,但我不斷收到以下錯誤:SSL:Windows上的CERTIFICATE_VERIFY_FAILED錯誤

Traceback (most recent call last): 
    File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 1318, in do_open 
    encode_chunked=req.has_header('Transfer-encoding')) 
    File "C:\ProgramData\Anaconda3\lib\http\client.py", line 1239, in request 
    self._send_request(method, url, body, headers, encode_chunked) 
    File "C:\ProgramData\Anaconda3\lib\http\client.py", line 1285, in _send_request 
    self.endheaders(body, encode_chunked=encode_chunked) 
    File "C:\ProgramData\Anaconda3\lib\http\client.py", line 1234, in endheaders 
    self._send_output(message_body, encode_chunked=encode_chunked) 
    File "C:\ProgramData\Anaconda3\lib\http\client.py", line 1026, in _send_output 
    self.send(msg) 
    File "C:\ProgramData\Anaconda3\lib\http\client.py", line 964, in send 
    self.connect() 
    File "C:\ProgramData\Anaconda3\lib\http\client.py", line 1400, in connect 
    server_hostname=server_hostname) 
    File "C:\ProgramData\Anaconda3\lib\ssl.py", line 401, in wrap_socket 
    _context=self, _session=session) 
    File "C:\ProgramData\Anaconda3\lib\ssl.py", line 808, in __init__ 
    self.do_handshake() 
    File "C:\ProgramData\Anaconda3\lib\ssl.py", line 1061, in do_handshake 
    self._sslobj.do_handshake() 
    File "C:\ProgramData\Anaconda3\lib\ssl.py", line 683, in do_handshake 
    self._sslobj.do_handshake() 
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749) 

During handling of the above exception, another exception occurred: 

Traceback (most recent call last): 
    File "C:/Users/thoma/PycharmProjects/fyp/urls_and_prep/parsing_html.py", line 17, in <module> 
    response = urllib.request.urlopen(req) 
    File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 223, in urlopen 
    return opener.open(url, data, timeout) 
    File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 526, in open 
    response = self._open(req, data) 
    File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 544, in _open 
    '_open', req) 
    File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 504, in _call_chain 
    result = func(*args) 
    File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 1361, in https_open 
    context=self._context, check_hostname=self._check_hostname) 
    File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 1320, in do_open 
    raise URLError(err) 
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749)> 

我的程序很簡單,但我不明白或發現任何好的資源找出究竟究竟在做什麼/如何處理它。我知道它與SSL證書有關,但我不確定如何使用它們或安裝它們的位置等。我只是在這一點上有點失落,因爲我從來沒有真正使用過SSL 。任何指導或幫助非常感謝。下面的代碼:

import urllib.request 
from bs4 import BeautifulSoup 

file = open("all_urls.txt", "r") 

for line in file: 
    print(line) 

    try: 
     response = urllib.request.urlopen(line) 
     html = response.read() 
    except ValueError: 
     print(ValueError) 
     continue 
    soup = BeautifulSoup(html, 'lxml') 
    print(soup.get_text()) 
+0

有[關於這個話題的許多問題(https://stackoverflow.com/search?q=is%3Aquestion合作+蟒蛇+證書+失敗)。如果這些沒有幫助,並且想要獲得有關您的具體問題的幫助,請提供足夠的詳細信息來重現問題。這尤其意味着代碼失敗的URL。 –

回答

0

你使用的是Windows還是Linux?這個問題似乎不在Python上,但在Anaconda或操作系統中。你可以嘗試一些簡單的解決方案,比如:1 - 使用其他Python安裝而不是Anaconda的安裝來執行scrypt。 2 - 使用virtualenv隔離操作系統的組件。

+0

我使用的窗口與anaconda,但我想我安裝了蟒蛇之前安裝了python和一些庫。你會認爲重新安裝python/anaconda幫助嗎?謝謝回覆? –

+0

從Anaconda安裝單個Python和Python是在不同的地方。嘗試在執行腳本時傳遞Python的完整路徑。例如:'C:\ Program Files \ Python34 \ Python xxxxxxxx.py' –

0

下面將解決問題。但一定不要在生產中使用,因爲它會不驗證SSL證書 -

import urllib 
from bs4 import BeautifulSoup 
import ssl 

# This is a temporary fix .Be carefule of malicious links 
context = ssl._create_unverified_context() 
file = open("all_urls.txt", "r") 

for line in file: 
    print(line) 

    try: 
     response = urllib.request.urlopen(line, context=context) 
     html = response.read() 
    except ValueError: 
     print(ValueError) 
     continue 
    soup = BeautifulSoup(html, 'lxml') 
    print(soup.get_text()) 
+0

好吧,這很好,我知道我的列表中沒有任何鏈接是惡意的,因此應該可以工作。我將在稍後的爬蟲程序中使用此代碼,那麼在那種情況下,我不知道我將檢查哪些鏈接,您會推薦什麼?非常感謝答覆,真的很感激。 –