2016-04-26 74 views
2

我正在使用Scrapy默認的RetryMiddleware嘗試重新下載失敗的URL。我想要處理這種方式的頁面,它在響應時獲得了429個狀態碼(「太多請求」)。Scrapy重試中間件失敗,出現非標準http狀態代碼

但我得到錯誤

Traceback (most recent call last): 
    File "/home/vagrant/parse/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 588, in _runCallbacks 
    current.result = callback(current.result, *args, **kw) 
    File "/home/vagrant/parse/local/lib/python2.7/site-packages/scrapy/core/downloader/middleware.py", line 46, in process_response 
    response = method(request=request, response=response, spider=spider) 
    File "/home/vagrant/parse/local/lib/python2.7/site-packages/scrapy/downloadermiddlewares/retry.py", line 58, in process_response 
    reason = response_status_message(response.status) 
    File "/home/vagrant/parse/local/lib/python2.7/site-packages/scrapy/utils/response.py", line 58, in response_status_message 
    reason = http.RESPONSES.get(int(status)).decode('utf8', errors='replace') 
AttributeError: 'NoneType' object has no attribute 'decode' 

我試圖調試問題,同時發現Scrapy RetryMiddleware其實之前重新嘗試下載頁面嘗試定義先前失敗的原因。 所以response_status_message方法嘗試創建使用狀態碼和狀態文本字符串,例如

>>> response_status_message(404) 
    '404 Not Found' 

爲了得到它採用雙絞線響應方法http.RESPONSES.get(int(status))響應字符串。但是如果自定義http狀態碼不使用默認參數get(),它將返回NoneType而不是字符串。

因此,Scrapy試圖爲NoneType調用decode('utf8', errors='replace')

有沒有可能避免這種情況?

回答

3

這實際上是在Scrapy庫中的錯誤。但它已經被固定在this commit並且被放置在RC1.1中。changelogs

+1

沒錯。這是問題:https://github.com/scrapy/scrapy/pull/1857 –