0
我做過scrapy-redis爬蟲並決定做一個分佈式爬蟲。對於更多,我想使它成爲基於任務,一個任務的一個名稱。所以,我打算把蜘蛛的名字改成任務的名字,並用這個名字來區分每個任務。因此,我在運行Web管理期間遇到了一個如何更改蜘蛛名稱的問題。有什麼方法可以通過腳本更改scrapy蜘蛛的名字
這是我的代碼,這是不成熟的:
#-*- encoding: utf-8 -*-
import redis
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from scrapy_redis.spiders import RedisSpider
import pymongo
client = pymongo.MongoClient('mongodb://localhost:27017')
db_name = 'news'
db = client[db_name]
class NewsSpider(RedisSpider):
"""Spider that reads urls from redis queue (myspider:start_urls)."""
name = 'news'
redis_key = 'news:start_urls'
start_urls = ["http://www.bbc.com/news"]
def parse(self, response):
pass
# I add those ,setname and getname
def setname(self, name):
self.name = name
def getname(self):
return self.name
def start():
news_spider = NewsSpider()
news_spider.setname('test_spider_name')
print news_spider.getname()
r = redis.Redis(host='127.0.0.1', port=6379, db=0)
r.lpush('news:start_urls', 'http://news.sohu.com/')
process = CrawlerProcess(get_project_settings())
process.crawl('test_spider_name')
process.start() # the script will block here until the crawling is finished
if __name__ == '__main__':
start()
而且有錯誤:
test_spider_name
2017-05-26 20:14:05 [scrapy.utils.log] INFO: Scrapy 1.3.3 started (bot: scrapybot)
2017-05-26 20:14:05 [scrapy.utils.log] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'geospider.spiders', 'SPIDER_MODULES': ['geospider.spiders'], 'COOKIES_ENABLED': False, 'SCHEDULER': 'scrapy_redis.scheduler.Scheduler', 'DUPEFILTER_CLASS': 'scrapy_redis.dupefilter.RFPDupeFilter'}
Traceback (most recent call last):
File "/home/kui/work/python/project/bigcrawler/geospider/control/command.py", line 29, in <module>
start()
File "/home/kui/work/python/project/bigcrawler/geospider/control/command.py", line 23, in start
process.crawl('test_spider_name')
File "/home/kui/work/python/env/lib/python2.7/site-packages/scrapy/crawler.py", line 162, in crawl
crawler = self.create_crawler(crawler_or_spidercls)
File "/home/kui/work/python/env/lib/python2.7/site-packages/scrapy/crawler.py", line 190, in create_crawler
return self._create_crawler(crawler_or_spidercls)
File "/home/kui/work/python/env/lib/python2.7/site-packages/scrapy/crawler.py", line 194, in _create_crawler
spidercls = self.spider_loader.load(spidercls)
File "/home/kui/work/python/env/lib/python2.7/site-packages/scrapy/spiderloader.py", line 55, in load
raise KeyError("Spider not found: {}".format(spider_name))
KeyError: 'Spider not found: test_spider_name'
我知道這是一個愚蠢的方法,我在尋找一個很長一段時間淨。但沒有用處。請幫助我或提出一些想法如何實現這一點。
在此先感謝。
謝謝你,但沒有奏效。 – haomao