我目前正在通過Ryan Mitchell的Python網頁抓取。在第一章中,當他談到處理錯誤,他說:當url被誤輸入時,urlopen不返回無對象
如果一切都沒有找到服務器(如果說,網站已經關閉,或者URL 輸入錯誤),
urlopen
返回None
對象。
所以要測試這個,我創建了以下代碼片段。
from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup as bs
def getTitle(url):
try:
html = urlopen(url).read()
except HTTPError:
return None
try:
bsObj = bs(html)
except AttributeError:
return None
return bsObj
title = getTitle('http://www.wunderlst.com')
print(title)
在這段代碼中的倒數第二行,我故意輸入了錯誤的URL名稱(實際的URL是http://www.wunderlist.com
)。我希望現在我能在屏幕上打印None
。但是,我收到了一長串錯誤。下面我給錯誤消息的最後部分:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "ex4.py", line 18, in <module>
title = getTitle('http://www.wunderlst.com')
File "ex4.py", line 8, in getTitle
html = urlopen(url).read()
File "/usr/lib/python3.4/urllib/request.py", line 161, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.4/urllib/request.py", line 463, in open
response = self._open(req, data)
File "/usr/lib/python3.4/urllib/request.py", line 481, in _open
'_open', req)
File "/usr/lib/python3.4/urllib/request.py", line 441, in _call_chain
result = func(*args)
File "/usr/lib/python3.4/urllib/request.py", line 1210, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/usr/lib/python3.4/urllib/request.py", line 1184, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno -2] Name or service not known>
現在,如果我正確的URL名稱,但在該網站的前寫一些不存在的頁面,例如:
title = getTitle('http://www.wunderlist.com/something')
然後我在屏幕上打印了None
。我對此很困惑。任何人都可以善意地解釋我究竟發生了什麼?提前致謝。
這很有用。我不明白爲什麼這本書沒有提到'URLError'。 – Peaceful