2016-12-14 26 views
0

我閱讀文檔並找到命令行應該是這樣的。 scrapy runspider getspecificimg.py -a ip='lizhe'如何使用-a選項將參數傳遞給scrapy?

我的蜘蛛的代碼是這樣的:

class GetImage(scrapy.Spider): 
    name = 'ImageSpider' 
    start_urls = ['https://www.pexels.com/'] 

# Get the input argument 
    # NameNeedSearch = InputPara 
    NameNeedSearch = ip 

但結果我得到的意思是ip isn't defined why? 20161211162649.bmp

- 更新 - 我想在一個變量來傳遞,然後使用它來連接full url並將其用作start_url 我的代碼如下所示:並獲取錯誤self is not defined爲什麼?

class GetImage(scrapy.Spider): 
    name = 'ImageSpider' 
# Get the input argument 
    NameNeedSearch = self.ip 
    # startUrl = 'https://www.pexels.com/' + 
    start_urls = ['https://www.pexels.com/'] 

回答

1

你需要在你的GetImage類方法使用self,例如__init__start_requests啓動抓取時被稱爲編寫代碼。

當框架調用,這些方法會得到作爲第一個參數的類的實例本身,可作爲方法簽名中使用的常規self變量(它只是一個慣例):

class GetImage(scrapy.Spider): 
    name = 'ImageSpider' 
    start_urls = ['https://www.pexels.com/'] 

    def start_requests(self): 
     # self points to the spider instance 
     # that was initialized by the scrapy framework when starting a crawl 
     # 
     # spider instances are "augmented" with crawl arguments 
     # available as instance attributes, 
     # self.ip has the (string) value passed on the command line 
     # with `-a ip=somevalue` 
     for url in self.start_urls: 
      yield scrapy.Request(url+self.ip, dont_filter=True)