2013-12-12 24 views
1

我目前正在尋找導出報廢數據到文件,這些名字都是基於蜘蛛名稱。Scrapy:創建帶蜘蛛名稱的csv文件

這裏是我的pipelines.py:

from mydatacrowd.models import Datacrowd 
from scrapy.contrib.exporter import CsvItemExporter 

class CsvExportPipeline(object): 

    def _init_(self): 
     self.files = {} 

    @classmethod 
    def from_crawlers(cls, crawler): 
     pipeline = cls() 
     crawler.signal.connect(pipeline.spider_opened, signal.spider_opened) 
     crawler.signal.connect(pipeline.spider_closed, signal.spider_closed) 
     return pipeline 

    def spider_opened(self, spider): 
     print 'Hello world!' 
     print spider.name 
     file = open('%s.csv' % spider.name, 'w+b') 
     self.files[spider] = file 
     self.exporter = CsvItemExporter(file) 
     self.exporter.start_exporting() 

    def spider_closed(self, spider): 
     self.exporter.finish_exporting() 
     file = self.files.pop(spider) 
     file.close() 

    def process_item(self, item, spider): 

     item.save() 
     return item 

這裏是我的settings.py的一部分:

... 
ITEM_PIPELINES = { 
    'datacrowdscrapy.pipelines.CsvExportPipeline': 1000, 
} 

FEED_FORMAT = 'csv' 

FEED_EXPORTERS = { 
    'csv': 'datacrowdscrapy.feedexport.CsvScrapperExporter' 
} 
... 

這裏是我的feedexport.py:

from scrapy.conf import settings 
from scrapy.contrib.exporter import CsvItemExporter 

class CsvScrapperExporter(CsvItemExporter): 

    def _init_(self, *args, **kwargs): 
     kwargs['fields_to_export'] = settings.getlist('EXPORT_FIELDS') or None 
     kwargs['encoding'] = settings.get('EXPORT_ENCODING', 'utf-8') 

     super(CsvScrapperExporter, self).__init__(*args, **kwargs) 

無文件被創建,沒有錯誤顯示和'你好世界'從來沒有在日誌apear,我錯過了什麼?

謝謝!

編輯:

沒有FEED_URI參數爲我的settings.py,有何幫助?

+0

在Hello World之後,你似乎錯過了**'**。 –

+0

複製/過去的錯誤,對不起 – Snite

回答

1

看着scrapy爬行命令源看來,如果你與輸出選項像這樣提供它認爲scrapy將只讀取FEED_EXPORTERS設置:

scrapy crawl <spider_name> -o csv 

從scrapy /命令/ crawl.py:

if opts.output: 
    ... 
    valid_output_formats = self.settings['FEED_EXPORTERS'].keys() + 
          self.settings['FEED_EXPORTERS_BASE'].keys() 
    .... 
    self.settings.overrides['FEED_FORMAT'] = opts.output_format 
+0

沒有解決我的問題,但它控制它:scrapy抓取 -o .csv – Snite