如何將SitemapSpider收集的CSV文件鏈接導入第二個蜘蛛，即CSVFeedSpider

我有一個可以收集csv文件鏈接的站點地圖蜘蛛。我想用一個csv蜘蛛抓取這些鏈接。我將如何去將一隻蜘蛛的輸出提供給另一隻蜘蛛？如何將SitemapSpider收集的CSV文件鏈接導入第二個蜘蛛，即CSVFeedSpider

2017-04-06 tylerjw

from scrapy.spiders import CSVFeedSpider 
from myproject.items import TestItem 

class MySpider(CSVFeedSpider): 
    name = 'example.com' 
    allowed_domains = ['example.com'] 
    start_urls = ['http://www.example.com/feed.csv'] 
    delimiter = ';' 
    quotechar = "'" 
    headers = ['id', 'name', 'description'] 

    def parse_row(self, response, row): 
     self.logger.info('Hi, this is a row!: %r', row) 

     item = TestItem() 
     item['id'] = row['id'] 
     item['name'] = row['name'] 
     item['description'] = row['description'] 
     return item

要使用本地文件用它來代替，只是用文件網址：file:///home/user/some.csv

來源

2017-04-07 06:26:06 Granitosaurus

我想要，現在是SitemapSpider是填充一個數據庫指向csv文件和CSVFeedSpider的鏈接正在從該數據庫中讀取。 – tylerjw

@tylerjw爲什麼不把所有東西都存儲在數據庫中並將csv中間人剪掉？對於像mongo或couchdb這樣的scrapy documment驅動的數據庫來說，它們的工作非常出色，或者如果您沒有太多的數據，redis是一個超級簡單的解決方案！ – Granitosaurus

我正在使用mongo數據庫來存儲結果。問題在於，我想要的csv文件中的特定數據比頁面上的列表視圖中顯示的要多。我最終想出了uri的csv api的參數，所以我甚至不需要在頁面上加載鏈接。 – tylerjw

如何將SitemapSpider收集的CSV文件鏈接導入第二個蜘蛛，即CSVFeedSpider

回答

相關問題