1
我只是在用BeautifulSoup做一些網頁抓取,而且我遇到了一個奇怪的錯誤。代碼:BeautifulSoup超時實例化?
print "Running urllib2"
g = urllib2.urlopen(link + "about", timeout=5)
print "Finished urllib2"
about_soup = BeautifulSoup(g, 'lxml')
下面是輸出:
Running urllib2
Finished urllib2
Error
Traceback (most recent call last):
File "/Users/pspieker/Documents/projects/ThePyStrikesBack/tests/TestSpringerOpenScraper.py", line 10, in test_strip_chars
for row in self.instance.get_entries():
File "/Users/pspieker/Documents/projects/ThePyStrikesBack/src/JournalScrapers.py", line 304, in get_entries
about_soup = BeautifulSoup(g, 'lxml')
File "/Users/pspieker/.virtualenvs/thepystrikesback/lib/python2.7/site-packages/bs4/__init__.py", line 175, in __init__
markup = markup.read()
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 355, in read
data = self._sock.recv(rbufsize)
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 588, in read
return self._read_chunked(amt)
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 648, in _read_chunked
value.append(self._safe_read(amt))
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 703, in _safe_read
chunk = self.fp.read(min(amt, MAXAMOUNT))
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 384, in read
data = self._sock.recv(left)
timeout: timed out
我明白urllib2.urlopen
可能會造成問題,但在該行實例BeautifulSoup出現異常。我做了一些Google搜索,但沒有找到關於BeautfiulSoup
超時問題的任何信息。
關於發生了什麼的任何想法?
呵呵,所以'urllib2.urlopen'對象在實例化時不會拋出異常? – user2740614
@ user2740614 nope,它會在調用'read()'時觸發.. – alecxe
好吧,這解釋了爲什麼它不起作用。如果你不介意,爲什麼你會像這樣設計'urllib2.urlopen'?你不希望它失敗得更快嗎(例如在實例化上)?只是好奇:) – user2740614