我試圖使用urllib2和BeautifulSoup刮取網頁。它工作正常,然後當我在我的代碼的不同部分放入一個input()
來嘗試和調試某些內容時,我得到了一個HTTPError。當我嘗試再次運行我的程序時,嘗試調用read()時出現HTTPError。錯誤堆棧如下:使用urllib2時的HTTPError讀取()
[2013-07-17 16:47:07,415: ERROR/MainProcess] Task program.tasks.testTask[460db7cf-ff58-4a51-9c0f-749affc66abb] raised exception: IOError()
16:47:07 celeryd.1 | Traceback (most recent call last):
16:47:07 celeryd.1 | File "/Users/username/folder/server2/venv/lib/python2.7/site-packages/celery/execute/trace.py", line 181, in trace_task
16:47:07 celeryd.1 | R = retval = fun(*args, **kwargs)
16:47:07 celeryd.1 | File "/Users/username/folder/server2/program/tasks.py", line 193, in run
16:47:07 celeryd.1 | self.get_top_itunes_game_by_genre(genre)
16:47:07 celeryd.1 | File "/Users/username/folder/server2/program/tasks.py", line 244, in get_top_itunes_game_by_genre
16:47:07 celeryd.1 | game_page = BeautifulSoup(urllib2.urlopen(game_url).read())
16:47:07 celeryd.1 | File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 126, in urlopen
16:47:07 celeryd.1 | return _opener.open(url, data, timeout)
16:47:07 celeryd.1 | File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 400, in open
16:47:07 celeryd.1 | response = meth(req, response)
16:47:07 celeryd.1 | File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 513, in http_response
16:47:07 celeryd.1 | 'http', request, response, code, msg, hdrs)
16:47:07 celeryd.1 | File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 438, in error
16:47:07 celeryd.1 | return self._call_chain(*args)
16:47:07 celeryd.1 | File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 372, in _call_chain
16:47:07 celeryd.1 | result = func(*args)
16:47:07 celeryd.1 | File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 521, in http_error_default
16:47:07 celeryd.1 | raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
16:47:07 celeryd.1 | HTTPError
下面的代碼:
for game_url in urls:
game_page = BeautifulSoup(urllib2.urlopen(game_url).read())
# code to process page
有誰知道爲什麼,我開始收到此錯誤?謝謝!
'HTTPError'通常在網絡層或服務器端失敗時返回。考慮嘗試打開你認爲你在瀏覽器中打開的任何URL,並查看它是否在那裏工作。 – Amber
urllib2遲早會拋出一些錯誤(你可能得到了40倍的迴應),你應該準備抓住它們。 – roippi
看起來像某些網頁只是因爲某些原因不加載,這是問題所在。 – user1998511