第一個請求的URL我使用scrapy(版本:1.1.1)scrapy在互聯網上的某個日期。這是我面對:我怎樣才能得到302重定向301
class Link_Spider(scrapy.Spider):
name = 'GetLink'
allowed_domains = ['example_0.com']
with codecs.open('link.txt', 'r', 'utf-8') as f:
start_urls = [url.strip() for url in f.readlines()]
def parse(self, response):
print response.url
在上面的代碼中, 'start_urls' 類型的列表:
start_urls = [
example_0.com/?id=0,
example_0.com/?id=1,
example_0.com/?id=2,
] # and so on
當scrapy運行,調試信息告訴我:
[scrapy] DEBUG: Redirecting (302) to (GET https://example_1.com/?subid=poison_apple) from (GET http://example_0.com/?id=0)
[scrapy] DEBUG: Redirecting (301) to (GET https://example_1/ture_a.html) from (GET https://example_1.com/?subid=poison_apple)
[scrapy] DEBUG: Crawled (200) (GET https://example_1/ture_a.html) (referer: None)
現在,我如何知道'start_url'中'http://example_0.com/?id= ***'的網址與'https://example_1/ture_a.html'的網址配對?有人可以幫助我嗎?
我tryed,但「打印response.request.url」是沒有工作,它只是打印「https://example_1/ture_a.html」。由於響應是「抓取(200)」最後一個調試信息,而不是第一個調試信息「重定向(302)」 – xie