如果能夠同時從腳本

我想run scrapy from a single script運行scrapy更改設置，我想從settings.py所有設置，但我希望能夠改變一些人：如果能夠同時從腳本

from scrapy.crawler import CrawlerProcess 
from scrapy.utils.project import get_project_settings 

process = CrawlerProcess(get_project_settings()) 

*### so what im missing here is being able to set or override one or two of the settings###* 


# 'followall' is the name of one of the spiders of the project. 
process.crawl('testspider', domain='scrapinghub.com') 
process.start() # the script will block here until the crawling is finished

我無法使用this。我試過以下內容：

settings=scrapy.settings.Settings() 
settings.set('RETRY_TIMES',10)

但它沒有工作。

注意：我正在使用最新版本的scrapy。

來源

2015-10-13 Mariah

因此，爲了覆蓋一些設置，一種方法是覆蓋/設置我們的腳本中的蜘蛛的靜態變量custom_settings。

所以我進口蜘蛛類，然後重寫custom_setting：

from testspiders.spiders.followall import FollowAllSpider 

FollowAllSpider.custom_settings={'RETRY_TIMES':10}

所以這是整個腳本：

from scrapy.crawler import CrawlerProcess 
from scrapy.utils.project import get_project_settings 
from testspiders.spiders.followall import FollowAllSpider 

FollowAllSpider.custom_settings={'RETRY_TIMES':10} 
process = CrawlerProcess(get_project_settings()) 


# 'followall' is the name of one of the spiders of the project. 
process.crawl('testspider', domain='scrapinghub.com') 
process.start() # the script will block here until the crawling is finished

來源

2015-10-13 19:05:37 Mariah

出於某種原因，上面的腳本並沒有對我的工作。相反，我寫了下面的內容，它可以工作。如果有人遇到同樣的問題，則發帖。

from scrapy.crawler import CrawlerProcess 
from scrapy.utils.project import get_project_settings 

process = CrawlerProcess(get_project_settings()) 
process.settings.set(
      'RETRY_TIMES', 10, priority='cmdline') 

process.crawl('testspider', domain='scrapinghub.com') 
process.start()

來源

2016-12-26 03:13:50 arbazkhan002

如果能夠同時從腳本

回答

相關問題