我試圖寫一個網頁被稱爲具有下列參數的通用履帶:傳遞使用XPath作爲參數傳遞給Scrapy
- 允許域
- URL被抓取
- 的XPath提取網頁內的價格
URL和允許的域參數似乎工作正常,但我無法讓xPath參數工作。
我猜我需要聲明一個變量來保持它正確,因爲其他兩個參數被分配給現有的類元素。
這裏是我的蜘蛛:
import scrapy
from Spotlite.items import SpotliteItem
class GenericSpider(scrapy.Spider):
name = "generic"
def __init__(self, start_url=None, allowed_domains=None, xpath_string=None, *args, **kwargs):
super(GenericSpider, self).__init__(*args, **kwargs)
self.start_urls = ['%s' % start_url]
self.allowed_domains = ['%s' % allowed_domains]
xpath_string = ['%s' % xpath_string]
def parse(self, response):
self.logger.info('Hi, this is an item page! %s', response.url)
item = SpotliteItem()
item['url'] = response.url
item['price'] = response.xpath(xpath_string).extract()
return item
我得到以下錯誤:
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 577, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/ubuntu/spotlite/spotlite/spiders/generic.py", line 23, in parse
item['price'] = response.xpath(xpath_string).extract()
NameError:全局名稱 'xpath_string' 沒有定義
任何援助將不勝感激!
感謝,
邁克爾