2016-10-11 41 views
0

我定義了一個名爲「myspider」的蜘蛛,它的行爲根據設置會有所不同。我想在不同的過程中用不同的實例運行蜘蛛,這有可能嗎?我可以在不同的過程(並行)中使用不同的設置運行scrapy蜘蛛嗎?

我檢查了源代碼,看起來SpiderLoader只是走蜘蛛模塊,我可以一次運行一個具有相同名稱的蜘蛛。

運行的代碼看起來:

for item in items: 
    settings = get_project_settings() 
    settings.set('item', item) 
    settings.set('DEFAULT_REQUEST_HEADERS', item.get('request_header')) 
    process = CrawlerProcess(settings) 
    process.crawl("myspider") 
    process.start() 

,當然還有,錯誤顯示:

Traceback (most recent call last): 
    File "/home/xuanqi/workspace/github/foolcage/fospider/fospider/main.py", line 44, in <module> 
    process.start() # the script will block here until the crawling is finished 
    File "/usr/local/lib/python3.5/dist-packages/scrapy/crawler.py", line 280, in start 
    reactor.run(installSignalHandlers=False) # blocking call 
    File "/usr/local/lib/python3.5/dist-packages/twisted/internet/base.py", line 1194, in run 
    self.startRunning(installSignalHandlers=installSignalHandlers) 
    File "/usr/local/lib/python3.5/dist-packages/twisted/internet/base.py", line 1174, in startRunning 
    ReactorBase.startRunning(self) 
    File "/usr/local/lib/python3.5/dist-packages/twisted/internet/base.py", line 684, in startRunning 
    raise error.ReactorNotRestartable() 
twisted.internet.error.ReactorNotRestartable 

感謝您的幫助!

回答

0

在運行時無法更改設置。 我建議你使用蜘蛛參數來傳遞不同的變量給蜘蛛。

process = CrawlerProcess(settings) 
process.crawl("myspider", request_headers='specified headers...') 
process.start() 

而且這樣做,你必須覆蓋初始化你的蜘蛛函數接受這些變量。並將request_header傳遞給您在蜘蛛中使用的每個Request對象。

def __init__(self, **kw): 
    super(MySpider, self).__init__(**kw) 
    self.headers = kw.get('request_headers') 
    ... 
yield scrapy.Request(url='www.example.com', headers=self.headers)