2012-01-31 16 views
0

有人可以檢查下面的代碼是否正確? 代碼在 http://readthedocs.org/docs/scrapy/en/0.14/topics/exporters.htmlscrapy文檔中可能不正確的蜘蛛/導出器示例代碼

發現我認爲這是不正確的原因是:

  • 類保存了多個同時打開的文件多蜘蛛軌道,但是:
  • 出口商(這依賴於文件)在每次打開新蜘蛛時被覆蓋。

感謝您的任何幫助。

class XmlExportPipeline(object): 

    def __init__(self): 
     dispatcher.connect(self.spider_opened, signals.spider_opened) 
     dispatcher.connect(self.spider_closed, signals.spider_closed) 
     self.files = {} 

    def spider_opened(self, spider): 
     file = open('%s_products.xml' % spider.name, 'w+b') 
     self.files[spider] = file 
     self.exporter = XmlItemExporter(file) 
     self.exporter.start_exporting() 

    def spider_closed(self, spider): 
     self.exporter.finish_exporting() 
     file = self.files.pop(spider) 
     file.close() 

    def process_item(self, item, spider): 
     self.exporter.export_item(item) 
     return item 

回答

1

我覺得這個問題應該在scrapy-users group詢問。

AFAIK,從v0.14開始Scrapy在一個進程中不支持多個蜘蛛(related discussion),所以這段代碼可以正常工作。而對於多蜘蛛明顯的解決方法是讓exporters字典與spider鍵:

class XmlExportPipeline(object): 

    def __init__(self): 
     dispatcher.connect(self.spider_opened, signals.spider_opened) 
     dispatcher.connect(self.spider_closed, signals.spider_closed) 
     self.files = {} 
     self.exporters = {} 

    def spider_opened(self, spider): 
     file = open('%s_products.xml' % spider.name, 'w+b') 
     self.files[spider] = file 
     self.exporters[spider] = XmlItemExporter(file) 
     self.exporters[spider].start_exporting() 

    def spider_closed(self, spider): 
     self.exporters[spider].finish_exporting() 
     file = self.files.pop(spider) 
     file.close() 

    def process_item(self, item, spider): 
     self.exporters[spider].export_item(item) 
     return item 
+0

謝謝,這是啓發 – mskel 2012-02-05 05:55:15