1
我想停止蜘蛛,如果某些條件得到滿足 我試着這樣做: raise CloseSpider('Some Text')
和如何停止scrapy履帶
sys.exit("SHUT DOWN EVERYTHING!")
但它不停止。 這裏是代碼編寫引發異常,而不是回報也不會工作作爲蜘蛛繼續抓取:
import scrapy
from scrapy.http import Request
from tutorial.items import DmozItem
from scrapy.exceptions import CloseSpider
import sys
class DmozSpider(scrapy.Spider):
name = "tutorial"
allowed_domain = ["jabong.com"]
start_urls = [
"http://www.jabong.com/women/shoes/sandals/?page=1"
]
page_index = 1
def parse(self,response):
products = response.xpath('//li')
if products:
for product in products:
item = DmozItem()
item_url = product.xpath('@data-url').extract()
item_url = "http://www.jabong.com/" + item_url[0] if item_url else ''
if item_url:
request=Request(url=item_url,callback=self.parse_page2,meta={"item":item},
headers={"Accept":
"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"})
request.meta['item'] = item
yield request
else:
return
self.page_index += 1
if self.page_index:
yield Request(url="http://www.jabong.com/women/shoes/sandals/?page=%s" % (self.page_index),
headers={"Referer": "http://www.jabong.com/women/shoes/sandals/",
"X-Requested-With": "XMLHttpRequest"},
callback=self.parse)
def parse_page2(self, response):
sizes=[]
item = response.meta['item']
item['site_name'] = 'jabong'
item['tags'] = ''
yield item
更新: 取而代之的回報,即使我提出closespider我的蜘蛛停不
看到這個:http://stackoverflow.com/questions/4448724/force-my-scrapy-spider-to-stop-crawling希望它有幫助。 –
提高'CloseSpider'應該做到這一點,在代碼中使用它。我敢打賭,這個例外只是出於某種原因而沒有提出。 – alecxe