如何在Scrapy中將抓取的數據寫入CSV文件？

我試圖通過提取子鏈接和他們的頭銜刮網站，然後將提取的標題及其相關的鏈接保存到一個CSV文件。我運行下面的代碼，創建了CSV文件，但它是空的。任何幫助？如何在Scrapy中將抓取的數據寫入CSV文件？

我Spider.py文件看起來像這樣：

from scrapy import cmdline 
from scrapy.contrib.spiders import CrawlSpider, Rule 
from scrapy.contrib.linkextractors import LinkExtractor 

class HyperLinksSpider(CrawlSpider): 
    name = "linksSpy" 
    allowed_domains = ["some_website"] 
    start_urls = ["some_website"] 
    rules = (Rule(LinkExtractor(allow=()), callback='parse_obj', follow=True),) 

def parse_obj(self, response): 
    items = [] 
    for link in LinkExtractor(allow=(), deny=self.allowed_domains).extract_links(response): 
     item = ExtractlinksItem() 
     for sel in response.xpath('//tr/td/a'): 
       item['title'] = sel.xpath('/text()').extract() 
       item['link'] = sel.xpath('/@href').extract() 
     items.append(item) 
     return items 
cmdline.execute("scrapy crawl linksSpy".split())

我pipelines.py是：

import csv 

class ExtractlinksPipeline(object): 

def __init__(self): 
    self.csvwriter = csv.writer(open('Links.csv', 'wb')) 

def process_item(self, item, spider): 
    self.csvwriter.writerow((item['title'][0]), item['link'][0]) 
    return item

我items.py是：

import scrapy 

class ExtractlinksItem(scrapy.Item): 
# define the fields for your item here like: 
    title = scrapy.Field() 
    link = scrapy.Field() 

pass

我也改變了我的settings.py：

ITEM_PIPELINES = {'extractLinks.pipelines.ExtractlinksPipeline': 1}

來源

2017-01-06 owise

輸出所有數據scrapy已經內置的功能，稱爲Feed Exports。
簡而言之，您需要的僅僅是settings.py文件中的兩個設置：FEED_FORMAT - 應保存饋送的格式，您的情況csv和FEED_URI - 應保存饋送的位置。
https://stackoverflow.com/a/41473241/3737009

來源

2017-01-06 17:01:39 Granitosaurus

能否請您詳細闡述更多關於這個：~/my_feed.csv

我回答有關與用例覆蓋它更詳細？我試過你在這裏給出的例子[鏈接]（http://stackoverflow.com/a/41473241/3737009），但沒有寫入到csv文件。你的意思是我必須把這兩個設置放在我的setting.py中，而不需要在我的代碼中改變任何東西？我想我應該禁用pipelineitems方法，對吧？ – owise

@owise是啊試着禁用你的管道。只要您的蜘蛛返回任何物品，飼料出口商就會將它們寫入您的飼料。 – Granitosaurus

如何在Scrapy中將抓取的數據寫入CSV文件？

回答

相關問題