的第二電平在下面的代碼解析函數執行大約32倍(福爾環32 HREF的實測值)在同一開頭每個子鏈路該去颳去數據(32個單個網址parse_next功能) 。但parse_next功能只執行一次(單程)/不叫(輸出CSV文件empty.can任何人能幫助我在哪裏,我沒有錯刮:刮URL
import scrapy
import logging
logger = logging.getLogger('mycustomlogger')
from ScrapyTestProject.items import ScrapytestprojectItem
class QuotesSpider(scrapy.Spider):
name = "nestedurl"
allowed_domains = ['www.grohe.in']
start_urls = [
'https://www.grohe.com/in/7780/bathroom/bathroom-faucets/essence/',
def parse(self, response):
logger.info("Parse function called on %s", response.url)
for divs in response.css('div.viewport div.workspace div.float-box'):
item = {'producturl': divs.css('a::attr(href)').extract_first(),
'imageurl': divs.css('a img::attr(src)').extract_first(),
'description' : divs.css('a div.text::text').extract() + divs.css('a span.nowrap::text').extract()}
next_page = response.urljoin(item['producturl'])
#logger.info("This is an information %s", next_page)
yield scrapy.Request(next_page, callback=self.parse_next, meta={'item': item})
#yield item
def parse_next(self, response):
item = response.meta['item']
logger.info("Parse function called on2 %s", response.url)
item['headline'] = response.css('div#content a.headline::text').extract()
return item
#response.css('div#product-variants a::attr(href)').extract()
檢查您的循環,並應正常工作。因此,日誌中應該存在某種錯誤。你有沒有試圖用DEBUG日誌級別運行蜘蛛?這應該給你一些指示哪裏出錯的地方。 – Casper