這是一個網頁挖掘腳本。處理不完整的閱讀，URLError

def printer(q,missing): 
    while 1: 
     tmpurl=q.get() 
     try: 
      image=urllib2.urlopen(tmpurl).read() 
     except httplib.HTTPException: 
      missing.put(tmpurl) 
      continue 
     wf=open(tmpurl[-35:]+".jpg","wb") 
     wf.write(image) 
     wf.close()

q是一個URL的組成Queue()和`缺少的是一個空隊列來收集錯誤認識的URL

它並聯由10個線程中運行。

每次我運行這個，我得到了這個。

File "C:\Python27\lib\socket.py", line 351, in read 
    data = self._sock.recv(rbufsize) 
    File "C:\Python27\lib\httplib.py", line 541, in read 
    return self._read_chunked(amt) 
    File "C:\Python27\lib\httplib.py", line 592, in _read_chunked 
    value.append(self._safe_read(amt)) 
    File "C:\Python27\lib\httplib.py", line 649, in _safe_read 
    raise IncompleteRead(''.join(s), amt) 
IncompleteRead: IncompleteRead(5274 bytes read, 2918 more expected)

但我使用except ... 我嘗試別的東西像

httplib.IncompleteRead 
urllib2.URLError

甚至，

image=urllib2.urlopen(tmpurl,timeout=999999).read()

但沒有這是工作..

我怎樣才能趕上IncompleteRead和URLError ？

來源

2012-08-13 from __future__

位晚了，但首次在谷歌。 Sooo，http://stackoverflow.com/a/14206036/1444854應該可以解決你的問題。順便說一下，通常如果你想捕捉多個異常，把它們放在一個元組中：除了（httplib.IncompleteRead，urllib2.URLError） – 2014-02-12 16:28:33

我認爲這個問題的正確答案取決於你認爲的「錯誤提示URL」。捕捉多個異常

的

方法，如果你認爲它會引發異常，應加入missing隊列，那麼你可以做任何網址：

try: 
    image=urllib2.urlopen(tmpurl).read() 
except (httplib.HTTPException, httplib.IncompleteRead, urllib2.URLError): 
    missing.put(tmpurl) 
    continue

這將捕獲任何這三個例外的和將該網址添加到missing隊列中。更簡單地說，你可以這樣做：

try: 
    image=urllib2.urlopen(tmpurl).read() 
except: 
    missing.put(tmpurl) 
    continue

爲了趕上任何例外，但這個不認爲Python的，並可能隱藏在代碼中的其他可能的錯誤。

如果「錯誤認識網址」你的意思是引發httplib.HTTPException錯誤的任何網址，但您仍想保留處理，如果接收到其他錯誤，那麼你可以這樣做：

try: 
    image=urllib2.urlopen(tmpurl).read() 
except httplib.HTTPException: 
    missing.put(tmpurl) 
    continue 
except (httplib.IncompleteRead, urllib2.URLError): 
    continue

這將只有將該URL添加到missing隊列中（如果它引發了httplib.HTTPException），但會捕獲和urllib.URLError，並防止腳本崩潰。

遍歷隊列

順便說一句，while 1循環總是關於向我一下。您應該能夠遍歷使用以下模式的隊列內容，但你可以自由地繼續做你的方式：

for tmpurl in iter(q, "STOP"): 
    # rest of your code goes here 
    pass

安全與文件

至於另一個工作放在一邊，除非它是絕對必要否則，您應該使用context managers來打開和修改文件。所以，你的三個文件的操作線將變成：

with open(tmpurl[-35:]+".jpg","wb") as wf: 
    wf.write()

上下文管理需要關閉文件的照顧，並會做這樣即使在寫文件時發生異常。

來源

2015-10-21 20:34:28

處理不完整的閱讀，URLError

回答

遍歷隊列

安全與文件

相關問題