0
我正在嘗試構建可以更改請求對象的url的scrapy的下載小程序。但我無法使用process_request工作,因爲下載頁面仍然是原始網址之一。我的代碼如下:Scrapy:無法更改DownloadMiddleware的process_request中的請求對象的url
#middlewares.py
class UrlModifyMiddleware(object):
def process_request(self, request, spider):
original_url = request.url
m_url = 'http://whatsmyuseragent.com/'
request.url = m_url
#request = request.replace(url=relay_url)
對蜘蛛的代碼:
#spider/test_spider.py
from scrapy.contrib.spiders import CrawlSpider
from scrapy.http import Request
class TestSpider(CrawlSpider):
name = "urltest"
start_url = "http://www.icanhazip.com/"
def start_requests(self):
yield Request(self.start_url,callback=self.parse_start)
def parse_start(self,response):
html_page = response.body
open('test.html', 'wb').write(html_page)
在settings.py我設置:
DOWNLOADER_MIDDLEWARES = {
'official_index.middlewares.UrlModifyMiddleware': 100,
}
你正在使用什麼scrapy版本?我用你的確切代碼做了一個測試項目,並按預期工作:start request.url是whatsmyuseragent.com。 – Rolando
@Rho謝謝你的回答。我正在使用0.14.4 – Arnold