傳遞使用XPath作爲參數傳遞給Scrapy

我試圖寫一個網頁被稱爲具有下列參數的通用履帶：傳遞使用XPath作爲參數傳遞給Scrapy

允許域
URL被抓取
的XPath提取網頁內的價格

URL和允許的域參數似乎工作正常，但我無法讓xPath參數工作。

我猜我需要聲明一個變量來保持它正確，因爲其他兩個參數被分配給現有的類元素。

這裏是我的蜘蛛：

import scrapy 
from Spotlite.items import SpotliteItem 

class GenericSpider(scrapy.Spider): 
    name = "generic" 

    def __init__(self, start_url=None, allowed_domains=None, xpath_string=None, *args, **kwargs): 
     super(GenericSpider, self).__init__(*args, **kwargs) 
     self.start_urls = ['%s' % start_url] 
     self.allowed_domains = ['%s' % allowed_domains] 
     xpath_string = ['%s' % xpath_string] 

    def parse(self, response): 
     self.logger.info('Hi, this is an item page! %s', response.url) 
     item = SpotliteItem() 
     item['url'] = response.url 
     item['price'] = response.xpath(xpath_string).extract() 
     return item

我得到以下錯誤：

Traceback (most recent call last): 
    File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 577, in _runCallbacks 
    current.result = callback(current.result, *args, **kw) 
    File "/home/ubuntu/spotlite/spotlite/spiders/generic.py", line 23, in parse 
    item['price'] = response.xpath(xpath_string).extract()

NameError：全局名稱 'xpath_string' 沒有定義

任何援助將不勝感激！

感謝，

邁克爾

來源

2016-08-02 user6669314

有xpath_string作爲實例變量代替：

import scrapy 
from Spotlite.items import SpotliteItem 

class GenericSpider(scrapy.Spider): 
    name = "generic" 

    def __init__(self, start_url=None, allowed_domains=None, xpath_string=None, *args, **kwargs): 
     super(GenericSpider, self).__init__(*args, **kwargs) 
     self.start_urls = ['%s' % start_url] 
     self.allowed_domains = ['%s' % allowed_domains] 
     self.xpath_string = xpath_string 

    def parse(self, response): 
     self.logger.info('Hi, this is an item page! %s', response.url) 
     item = SpotliteItem() 
     item['url'] = response.url 
     item['price'] = response.xpath(self.xpath_string).extract() 
     return item

來源

2016-08-02 19:31:32 alecxe

添加變量初始類的聲明解決了這一問題。

import scrapy 
from spotlite.items import SpotliteItem 


class GenericSpider(scrapy.Spider): 
    name = "generic" 
    xpath_string = "" 

    def __init__(self, start_url, allowed_domains, xpath_string, *args, **kwargs): 
     super(GenericSpider, self).__init__(*args, **kwargs) 
     self.start_urls = ['%s' % start_url] 
     self.allowed_domains = ['%s' % allowed_domains] 
     self.xpath_string = xpath_string 

    def parse(self, response): 
     self.logger.info('URL is %s', response.url) 
     self.logger.info('xPath is %s', self.xpath_string) 
     item = SpotliteItem() 
     item['url'] = response.url 
     item['price'] = response.xpath(self.xpath_string).extract() 
     return item

來源

2016-08-02 23:15:28 user6669314

傳遞使用XPath作爲參數傳遞給Scrapy

回答

相關問題