2017-06-17 69 views
0

我正在嘗試創建一個通用蜘蛛,它負責處理最常見的任務和特定的蜘蛛,它們繼承通用的蜘蛛並聲明網站特定的變量。創建一個通用scrapy蜘蛛和多個特定的

還有就是genericspider.py

# -*- coding: utf-8 -*- 
import scrapy 
from scrapy.spiders import Spider, CrawlSpider 

class GenericProductSpider(scrapy.Spider): 
    def __init__(self, start_urls=[], finditemprop='', keywords='', **kwargs): 
     CrawlSpider.__init__(self, **kwargs) 
     print ("\n\n Init Generic \n") 

然後我得在同一目錄中通用的一個specificspider.py

# -*- coding: utf-8 -*- 
import scrapy 
from scrapy.spiders import Spider, CrawlSpider 
from .genericfabric import GenericFabricsSpider 

class SpecificSpider(GenericProductSpider): 

    def __init__(self, **kwargs): 
     print ("\n init specific \n") 
     name = "specific1" 
     start_urls = ['http://www.specificdomian.com',] 

     super(SpecificSpider, self).__init__(name, start_urls, **kwargs) 

我似乎有理解如何正確調用超類的初始值設定項的問題。我得到各種錯誤消息,但通用蜘蛛的方法從未被執行過。

+0

查找cookiecutter python模塊查找cookiecutter scrapy模板.... y – scriptso

+0

@scriptso看起來很有趣。據我瞭解,這將是一種模板,我可以拋出我的變量,併爲我創建蜘蛛?如果我需要更新代碼,我需要在我假設的所有蜘蛛中更新它。 – Chris

回答

0

其實..它似乎工作正常 - 可能只是與參數問題。對於超類

工作代碼:

# -*- coding: utf-8 -*- 
from scrapy.spiders import Spider 
from test.items import TestItem 


class TestsuperSpider(Spider): 
    name = "testsuper" 
    allowed_domains = ["craigslist.org"] 
    start_urls = ["http://sfbay.craigslist.org/search/npo"] 
    supervar = "meine super var" 

    def __init__(self): 
     print ("super init") 

    def parse(self, response): 
     print ("super Parse") 

    def supermethod (self, subvar): 
     print ("\n\n Supermethod \n\n ") 
     print (self.supervar + " - " + subvar) 

類和子類:

# -*- coding: utf-8 -*- 
from scrapy.spiders import Spider 
from test.items import TestItem 
from test.spiders.testsuper import TestsuperSpider 


class TestsubSpider(TestsuperSpider): 
    name = "testsub" 
    allowed_domains = ["craigslist.org"] 
    start_urls = ["http://sfbay.craigslist.org/search/npo"] 
    subvar = "subvar" 

    def __init__(self): 
     print ("sub init") 
     super(TestsubSpider, self).__init__() 

    def parse(self, response): 
     super(TestsubSpider, self).supermethod(self.subvar) 
     print ("sub Parse") 

現在它需要清理,並將其調整到它的目的,但至少代碼運行正常。