2016-12-09 42 views
1

enter image description here如何從scrapy蜘蛛爬蟲設置參數

我想傳遞一個scrapy蜘蛛設置爲管道對象跟進問題How to pass parameter to a scrapy pipeline object一個數據庫表的參數。基於對這個問題的答案我有:

@classmethod 
def from_crawler(cls, crawler): 
    # Here, you get whatever value was passed through the "table" parameter 
    settings = crawler.settings 
    table = settings.get('table') 

    # Instantiate the pipeline with your table 
    return cls(table) 

def __init__(self, table): 
    _engine = create_engine("sqlite:///data.db") 
    _connection = _engine.connect() 
    _metadata = MetaData() 
    _stack_items = Table(table, _metadata, 
         Column("id", Integer, primary_key=True), 
         Column("detail_url", Text), 
    _metadata.create_all(_engine) 
    self.connection = _connection 
    self.stack_items = _stack_items 

我的蜘蛛看起來像:

class my_Spider(Spider): 

    name = "my" 

    def from_crawler(self, crawler, table='test'): 
     pass 


    def start_requests(self): 

     ..... 

我加入基於https://doc.scrapy.org/en/latest/topics/spiders.html#scrapy.spiders.Spider.from_crawler的from_crawler線,但現在我越來越:

File "C:\ENVS\virtalenvs\contact\lib\site-packages\twisted\internet\defer.py", line 1128, in _inlineCallbacks 
    result = g.send(result) 
    File "C:\ENVS\virtalenvs\contact\lib\site-packages\scrapy\crawler.py", line 90, in crawl 
    six.reraise(*exc_info) 
    File "C:\ENVS\virtalenvs\contact\lib\site-packages\scrapy\crawler.py", line 71, in crawl 
    self.spider = self._create_spider(*args, **kwargs) 
    File "C:\ENVS\virtalenvs\contact\lib\site-packages\scrapy\crawler.py", line 94, in _create_spider 
    return self.spidercls.from_crawler(self, *args, **kwargs) 
TypeError: unbound method from_crawler() must be called with My_Spider instance as first argument (got Crawler instance instead) 

我該如何得到這個工作?

編輯:

改變類方法後,我越來越:

exceptions.TypeError: __init__() takes exactly 1 argument (2 given) 
2016-12-09 15:47:37 [twisted] CRITICAL: 
Traceback (most recent call last): 
    File "C:\ENVS\virtalenvs\contact\lib\site-packages\twisted\internet\defer.py", line 1128, in _inlineCallbacks 
    result = g.send(result) 
    File "C:\ENVS\virtalenvs\contact\lib\site-packages\scrapy\crawler.py", line 90, in crawl 
    six.reraise(*exc_info) 
    File "C:\ENVS\virtalenvs\contact\lib\site-packages\scrapy\crawler.py", line 72, in crawl 
    self.engine = self._create_engine() 
    File "C:\ENVS\virtalenvs\contact\lib\site-packages\scrapy\crawler.py", line 97, in _create_engine 
    return ExecutionEngine(self, lambda _: self.stop()) 
    File "C:\ENVS\virtalenvs\contact\lib\site-packages\scrapy\core\engine.py", line 69, in __init__ 
    self.scraper = Scraper(crawler) 
    File "C:\ENVS\virtalenvs\contact\lib\site-packages\scrapy\core\scraper.py", line 71, in __init__ 
    self.itemproc = itemproc_cls.from_crawler(crawler) 
    File "C:\ENVS\virtalenvs\contact\lib\site-packages\scrapy\middleware.py", line 58, in from_crawler 
    return cls.from_settings(crawler.settings, crawler) 
    File "C:\ENVS\virtalenvs\contact\lib\site-packages\scrapy\middleware.py", line 36, in from_settings 
    mw = mwcls.from_crawler(crawler) 
    File "C:\ENVS\r2\my\my\pipelines.py", line 30, in from_crawler 
    return cls(table_name) 
TypeError: __init__() takes exactly 1 argument (2 given) 

回答

2

將參數傳遞到正在運行的蜘蛛(當你調用scrapy crawl myspider),你只需要與-a指定它爭論在外殼:

scrapy crawl myspider -a arg1=value1 

所以如果你有一隻蜘蛛類:

class MySpider(Spider): 
    name = "myspider" 

arg1參數將作爲一個實際參數蜘蛛實例,這意味着你可以在任何地方使用它的那個類進行傳遞:

class MySpider(Spider): 

    name = "myspider" 

    ... 

    def some_callback_method(self, response): 
     print self.arg1 
     ... 

無需設置from_crawler在實際的蜘蛛。

該管道還收到一個蜘蛛實例,並且您已經在那裏使用它。

UPDATE:

現在您的pipeline你是不是真的用「蜘蛛屬性」,但在scrapy設置的變量。如果你想通過表名作爲參數蜘蛛(所以要使用命令行-a),你有你的管道更改爲:

... 
@classmethod 
def from_crawler(cls, crawler): 
    table_name = getattr(crawler.spider, "table") 
    return cls(table_name) 
... 
+0

感謝您看這個。那麼如果我想按照上面所寫的那樣給管道類方法調用table'test',那麼我會傳遞什麼作爲參數? – user61629

+0

好的,對不起,你沒有在你的管道中使用蜘蛛參數,你正在使用設置,請檢查我更新的答案。 – eLRuLL

+0

請參閱我的編輯。 – user61629