2014-06-26 115 views
0

這裏是我的鬥志零碎日誌消息

LOG_ENABLED = True 
STATS_ENABLED = True 
LOG_FILE = 'crawl.log' 

設置和我的蜘蛛..

class AbcSpider(XMLFeedSpider): 
    handle_httpstatus_list = [404, 500] 
    name = 'abctv' 
    allowed_domains = ['abctvnepal.com.np'] 
    start_urls = [ 
     'http://www.abctvnepal.com.np', 
    ] 

    def parse(self, response): 

     mesg = "Spider {} is not working".format(name) 

     if response.status in self.handle_httpstatus_list: 
      return log.msg(mesg, level=log.ERROR) 

     hxs = HtmlXPathSelector(response) # The XPath selector 
     sites = hxs.select('//div[@class="marlr respo-left"]/div/div/h3') 
     items = [] 
     for site in sites: 
      item = NewsItem() 
      item['title'] = escape(''.join(site.select('a/text()').extract())).strip() 
      item['link'] = escape(''.join(site.select('a/@href').extract())).strip() 
      item['description'] = escape(''.join(site.select('p/text()').extract())) 
      item = Request(item['link'],meta={'item': item},callback=self.parse_detail) 
      items.append(item) 
     return items 

    def parse_detail(self, response): 
     item = response.meta['item'] 
     sel = HtmlXPathSelector(response) 
     details = sel.select('//div[@class="entry"]/p/text()').extract() 
     detail = '' 
     for piece in details: 
      detail = detail + piece 
     item['details'] = detail 
     item['location'] = detail.split(",",1)[0] 
     item['published_date'] = (detail.split(" ",1)[1]).split(" ",1)[0]+' '+((detail.split(" ",1)[1]).split(" ",1)[1]).split(" ",1)[0]  
     return item 

在這裏,我想如果響應代碼爲handle_httpstatus_list = [404, 500]發送日誌消息。任何人都可以給我例子如何做到這一點?會有幫助。

回答

1

scrapy documentation寫得很好,包含很多示例代碼。如果您正在開發您的第一個scrapy項目,那麼在這裏進行瀏覽是值得的。 :)

例如,logging documentation的快速掃描變成了下面的示例代碼:

from scrapy import log 
log.msg("This is a warning", level=log.WARNING) 

因此增加進口和去除return應該解決您的代碼

此外,應MESG在線使用self.name

mesg = "Spider {} is not working".format(self.name) 
+0

我這樣做,但它發送給我的所有日​​志。我的意思是無論它已經抓取它發送所有日誌。我想要的是發送日誌說蜘蛛不工作,只有當狀態碼在handle_httpstatus_list = [404,500]的列表中時, – Aaeronn