2017-03-06 61 views
0

如何創建一組文件的時候啓動我scrapy蜘蛛像這樣:在Scrapy指定哪些文件將數據導出到根據刮出結果

year1.csv 
year2.csv 
year3.csv 

也很清楚的文件,如果它存在,並且有內容在裏面。

和分析過程中,根據scrapy結果像這樣出口到每一個文件:

def parse(self,response): 
if response.css('#Contact1'): 
    yield{ 
    'Name': response.css('#ContactName1 a::text').extract_first() 
    } 

if response.css('#Contact1').extract_first() is "1": 
    export to year1.csv 
if response.css('#Contact1').extract_first() is "2": 
    export to year2.csv 
if response.css('#Contact1').extract_first() is "2": 
    export to year3.csv 

回答

0

你可以使用一個管道來做到這一點。這裏是官方文檔:https://doc.scrapy.org/en/latest/topics/item-pipeline.html

這裏是我將如何去做。 我將創建不同的文件

不同的項目

item.py

class Year1Item(): 
    name = scrapy.field() 
class Year2Item(): 
    name = scrapy.field() 
class Year3Item(): 
    name = scrapy.field() 

然後在你的蜘蛛文件,你可以做到這一點

def parse(self,response): 
    if response.css('#Contact1'): 
    if response.css('#Contact1').extract_first() is "1": 
     item = Year1Item() 
    if response.css('#Contact1').extract_first() is "2": 
     item = Year2Item() 
    if response.css('#Contact1').extract_first() is "2": 
     item = Year3Item() 
    item['Name'] = response.css('#ContactName1 a::text').extract_first() 
    return item 

然後在pipeline.py文件

def process_item(self, item, spider): 
    if isinstance(item,Year1Item): 
     export to year1.csv 
    if isinstance(item,Year2Item): 
     export to year2.csv 
    if isinstance(item,Year3Item): 
     export to year3.csv 

在你的管道文件裏面你可以有一個運行時的函數你的蜘蛛打開

def open_spider(self,spider): 
    #maybe here you could use python to check if the files already exist and delete them if they do