我有這樣的代碼在我的履帶Scrapy履帶不completeing所有環路的解析函數
class StackSpider(InitSpider):
name = 'stack'
allowed_domains = ['sitepoint.com']
start_urls = ["http://www.sitepoint.com"]
start_page = "http://www.sitepoint.com"
item = StackItem()
def init_request(self):
return Request(url=self.start_page, callback=self.parse)
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//div[@class="headline_area"]')
items = []
ivar = 1
for site in sites[:5]:
item = StackItem()
log.msg(' LOOP' +str(ivar)+ '', level=log.ERROR)
item['title'] ="yoo ma"
request = Request("http://www.sitepoint.com/getting-to-know-css3-selectors-structural-pseudo-classes/", callback=self.test1)
request.meta['item'] = item
ivar = ivar + 1
yield request
def test1(self, response):
log.msg(' LOOP 2 \n', level=log.ERROR)
item = response.meta['item']
item['desc'] = "test4"
return item
我做到了按documentation但它只能在一個環路。 我的意思是,我只能在日誌中看到屏幕上
LOOP1
LOOP2
應重複3次
我想回報的不同組合和屈服,
return request
和return item
給輸出LOOP1 LOOP2
yield request
andreturn item
給出輸出LOOP1 LOOP1 LOOP1 LOOP2
yield request
和yield item
使輸出LOOP1 LOOP1 LOOP1 LOOP2
return request
和yield item
使輸出LOOP1 LOOP2
我怎樣才能LOOP 1 LOOP2 LOOP1 LOOP2 AND so on
解決您的identation –
顯然站點= hxs.select多次請求(」 // div [@ class =「top」]')只返回兩個項目....沒有人可以證明這一點,因爲您缺少重要信息以便進一步重現此問題。因此-1 –
我可以確認它有許多來自scrapy外殼的項目。這就是爲什麼我切片檢測 – user19140477031