我試圖寫入我的蜘蛛的__init__
方法的日誌,但我似乎無法得到它的工作,儘管它從解析方法工作正常。Scrapy - 無法寫入登錄蜘蛛的__init__方法
init方法中對self.log的調用由方法'get_urls_from_file'進行。我知道該方法正在被調用,因爲我在標準輸出中看到了print語句,所以我想知道是否有人可以指向正確的方向。我正在使用scrapy v0.18。謝謝!
我的代碼如下:
from scrapy.spider import BaseSpider
from scrapy_redis import connection
from importlib import import_module
from scrapy import log
from scrapy.settings import CrawlerSettings
class StressS(BaseSpider):
name = 'stress_s_spider'
allowed_domains = ['www.example.com']
def __init__(self, url_file=None, *args, **kwargs):
super(StressS, self).__init__(*args, **kwargs)
settings = CrawlerSettings(import_module('stress_test.settings'))
if url_file:
self.url_file = url_file
else:
self.url_file = settings.get('URL_FILE')
self.start_urls = self.get_urls_from_file(self.url_file)
self.server = connection.from_settings(settings)
self.count_key = settings.get('ITEM_COUNT')
def parse(self, response):
self.log('Processed: %s, status code: %s' % (response.url, response.status), level = log.INFO)
self.server.incr(self.count_key)
def get_urls_from_file(self, fn):
urls = []
if fn:
try:
with open(fn, 'r') as f:
urls = [line.strip() for line in f.readlines()]
except IOError:
msg = 'File %s could not be opened' % fn
print msg
self.log(msg, level = log.ERROR)
return urls
要使用哪裏'self.log' INT你的'__init__'方法? –
只是編輯問題以反映這一點 - 在init中,我在get_urls_from_file方法中調用self.log。 – user2871292