Scrapy重試中間件失敗，出現非標準http狀態代碼

我正在使用Scrapy默認的RetryMiddleware嘗試重新下載失敗的URL。我想要處理這種方式的頁面，它在響應時獲得了429個狀態碼（「太多請求」）。Scrapy重試中間件失敗，出現非標準http狀態代碼

但我得到錯誤

Traceback (most recent call last): 
    File "/home/vagrant/parse/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 588, in _runCallbacks 
    current.result = callback(current.result, *args, **kw) 
    File "/home/vagrant/parse/local/lib/python2.7/site-packages/scrapy/core/downloader/middleware.py", line 46, in process_response 
    response = method(request=request, response=response, spider=spider) 
    File "/home/vagrant/parse/local/lib/python2.7/site-packages/scrapy/downloadermiddlewares/retry.py", line 58, in process_response 
    reason = response_status_message(response.status) 
    File "/home/vagrant/parse/local/lib/python2.7/site-packages/scrapy/utils/response.py", line 58, in response_status_message 
    reason = http.RESPONSES.get(int(status)).decode('utf8', errors='replace') 
AttributeError: 'NoneType' object has no attribute 'decode'

我試圖調試問題，同時發現Scrapy RetryMiddleware其實之前重新嘗試下載頁面嘗試定義先前失敗的原因。所以response_status_message方法嘗試創建使用狀態碼和狀態文本字符串，例如

>>> response_status_message(404) 
    '404 Not Found'

爲了得到它採用雙絞線響應方法http.RESPONSES.get(int(status))響應字符串。但是如果自定義http狀態碼不使用默認參數get()，它將返回NoneType而不是字符串。

因此，Scrapy試圖爲NoneType調用decode('utf8', errors='replace')。

有沒有可能避免這種情況？

來源

2016-04-26 s_mart

這實際上是在Scrapy庫中的錯誤。但它已經被固定在this commit並且被放置在RC1.1中。changelogs

來源

2016-04-26 04:01:44

沒錯。這是問題：https：//github.com/scrapy/scrapy/pull/1857 –

Scrapy重試中間件失敗，出現非標準http狀態代碼

回答

相關問題