2014-11-20 41 views
2

我嘗試使用spider.state在http://doc.scrapy.org/en/0.22/topics/jobs.html描述,但我得到的錯誤CrawlSpider派生類對象達斯沒有屬性「狀態」

MyCrawlSpider has no attribute 'state' 

我嘗試在INIT(用它)的功能一個CrawlSpider派生類。這可能是問題嗎?

class MyCrawlSpider(CrawlSpider): 
    crawl_start = datetime.utcnow().isoformat() 

    def __init__(self, *args, **kwargs): 
     super(MyCrawlSpider, self).__init__(*args, **kwargs) 

     if self.state.get('crawl_start'): 
      crawl_start = self.state.get('crawl_start') 
     else: 
      self.state["crawl_start"] = crawl_start 

我的目標是有crawl_start屬性是始終在isoformat時間字符串我履帶得到了第一次開始,獨立於當x繼續在那裏開始

回答

2

按照source codestate屬性被設置在在spider_opened() signal處理程序中的蜘蛛通過scrapy.contrib.spiderstate.SpiderStateextension

class SpiderState(object): 
    """Store and load spider state during a scraping job""" 

    ... 

    def spider_closed(self, spider): 
     if self.jobdir: 
      with open(self.statefn, 'wb') as f: 
       pickle.dump(spider.state, f, protocol=2) 

    def spider_opened(self, spider): 
     if self.jobdir and os.path.exists(self.statefn): 
      with open(self.statefn, 'rb') as f: 
       spider.state = pickle.load(f) 
     else: 
      spider.state = {} 

的信號比__init__()方法稍後發送是被EXEC uted - 蜘蛛實例上沒有state屬性 - 這就是爲什麼你會收到錯誤。