0
我試圖通過提取子鏈接和他們的頭銜刮網站,然後將提取的標題及其相關的鏈接保存到一個CSV文件。我運行下面的代碼,創建了CSV文件,但它是空的。任何幫助?如何在Scrapy中將抓取的數據寫入CSV文件?
我Spider.py文件看起來像這樣:
from scrapy import cmdline
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors import LinkExtractor
class HyperLinksSpider(CrawlSpider):
name = "linksSpy"
allowed_domains = ["some_website"]
start_urls = ["some_website"]
rules = (Rule(LinkExtractor(allow=()), callback='parse_obj', follow=True),)
def parse_obj(self, response):
items = []
for link in LinkExtractor(allow=(), deny=self.allowed_domains).extract_links(response):
item = ExtractlinksItem()
for sel in response.xpath('//tr/td/a'):
item['title'] = sel.xpath('/text()').extract()
item['link'] = sel.xpath('/@href').extract()
items.append(item)
return items
cmdline.execute("scrapy crawl linksSpy".split())
我pipelines.py是:
import csv
class ExtractlinksPipeline(object):
def __init__(self):
self.csvwriter = csv.writer(open('Links.csv', 'wb'))
def process_item(self, item, spider):
self.csvwriter.writerow((item['title'][0]), item['link'][0])
return item
我items.py是:
import scrapy
class ExtractlinksItem(scrapy.Item):
# define the fields for your item here like:
title = scrapy.Field()
link = scrapy.Field()
pass
我也改變了我的settings.py:
ITEM_PIPELINES = {'extractLinks.pipelines.ExtractlinksPipeline': 1}
能否請您詳細闡述更多關於這個:
~/my_feed.csv
我回答有關與用例覆蓋它更詳細?我試過你在這裏給出的例子[鏈接](http://stackoverflow.com/a/41473241/3737009),但沒有寫入到csv文件。 你的意思是我必須把這兩個設置放在我的setting.py中,而不需要在我的代碼中改變任何東西?我想我應該禁用pipelineitems方法,對吧? – owise
@owise是啊試着禁用你的管道。只要您的蜘蛛返回任何物品,飼料出口商就會將它們寫入您的飼料。 – Granitosaurus