我可以在不同的過程（並行）中使用不同的設置運行scrapy蜘蛛嗎？

我定義了一個名爲「myspider」的蜘蛛，它的行爲根據設置會有所不同。我想在不同的過程中用不同的實例運行蜘蛛，這有可能嗎？我可以在不同的過程（並行）中使用不同的設置運行scrapy蜘蛛嗎？

我檢查了源代碼，看起來SpiderLoader只是走蜘蛛模塊，我可以一次運行一個具有相同名稱的蜘蛛。

運行的代碼看起來：

for item in items: 
    settings = get_project_settings() 
    settings.set('item', item) 
    settings.set('DEFAULT_REQUEST_HEADERS', item.get('request_header')) 
    process = CrawlerProcess(settings) 
    process.crawl("myspider") 
    process.start()

，當然還有，錯誤顯示：

Traceback (most recent call last): 
    File "/home/xuanqi/workspace/github/foolcage/fospider/fospider/main.py", line 44, in <module> 
    process.start() # the script will block here until the crawling is finished 
    File "/usr/local/lib/python3.5/dist-packages/scrapy/crawler.py", line 280, in start 
    reactor.run(installSignalHandlers=False) # blocking call 
    File "/usr/local/lib/python3.5/dist-packages/twisted/internet/base.py", line 1194, in run 
    self.startRunning(installSignalHandlers=installSignalHandlers) 
    File "/usr/local/lib/python3.5/dist-packages/twisted/internet/base.py", line 1174, in startRunning 
    ReactorBase.startRunning(self) 
    File "/usr/local/lib/python3.5/dist-packages/twisted/internet/base.py", line 684, in startRunning 
    raise error.ReactorNotRestartable() 
twisted.internet.error.ReactorNotRestartable

感謝您的幫助！

來源

2016-10-11 foolcage

在運行時無法更改設置。我建議你使用蜘蛛參數來傳遞不同的變量給蜘蛛。

process = CrawlerProcess(settings) 
process.crawl("myspider", request_headers='specified headers...') 
process.start()

而且這樣做，你必須覆蓋初始化你的蜘蛛函數接受這些變量。並將request_header傳遞給您在蜘蛛中使用的每個Request對象。

def __init__(self, **kw): 
    super(MySpider, self).__init__(**kw) 
    self.headers = kw.get('request_headers') 
    ... 
yield scrapy.Request(url='www.example.com', headers=self.headers)

來源

2017-04-24 08:34:27

我可以在不同的過程（並行）中使用不同的設置運行scrapy蜘蛛嗎？

回答

相關問題