2014-05-07 28 views
0

我試圖使用scrapy CSVFeedSpider爲CSV鏈接 這是一個行示例:如何使用scrapy CSVFeedSpider抓取在其值中包含逗號的Feed?

號,「可能含有逗號」,「可能包含逗號」,「可能包含逗號」,文本,文本,文本,文本,文本「,可能包含逗號」

如果一個值包含逗號它被引號包圍,我怎麼能實現這一點,因爲它只接受一個分隔符?

http://doc.scrapy.org/en/latest/topics/spiders.html#csvfeedspider

回答

0

如果列由雙引號包圍,它正常工作與內部逗號。 它會抱怨,如果它是由單引號

這裏包圍不匹配長度蜘蛛代碼:

# -*- coding: utf-8 -*- 
from scrapy.spider import Spider 
from scrapy.selector import Selector 
from stackoverflow23429315.items import DemoItem 
from scrapy.contrib.spiders import CSVFeedSpider 
from scrapy import log 


class DmozSpider(CSVFeedSpider): 
    name = 'csvFeedTest'   
    start_urls = ['file:////home/vagrant/labs/stackoverflow23429315/test.csv'] 
    delimiter = ',' 
    headers = ['id', 'name', 'address1', 'address2', 'email'] 

    def parse_row(self, response, row): 
     log.msg('Hi, this is a row!: %r' % row) 

     item = DemoItem() 
     item['id'] = row['id'] 
     item['name'] = row['name'] 
     item['address1'] = row['address1'] 
     item['address2'] = row['address2'] 
     item['email'] = row['email'] 
     return item 

項目類:

from scrapy.item import Item, Field 

class DemoItem(Item): 
    id = Field() 
    name = Field() 
    address1 = Field() 
    address2 = Field() 
    email = Field() 

測試CSV文件:

1,"John, Doe","1234 Main Street, APT A","2nd Floor",[email protected] 
2,"John2, Doe","1234 Main Street, APT A","2nd Floor",[email protected] 
3,'John3, Doe','1234 Main Street, APT A','2nd Floor',[email protected] 
4,'John4, Doe','1234 Main Street, APT A','2nd Floor',[email protected] 
相關問題