我試圖抓取知名的英國零售商的網站,並得到一個AttributeError如下:Scrapy SitemapSpider不工作
nl_env/lib/python3.6/site-packages/scrapy/spiders/sitemap.py", line 52, in _parse_sitemap for r, c in self._cbs:
AttributeError: 'NlSMCrawlerSpider' object has no attribute '_cbs'
這可能是我沒有完全構思如何SitemapSpider工作 - 看我下面的代碼:
class NlSMCrawlerSpider(SitemapSpider):
name = 'nl_smcrawler'
allowed_domains = ['newlook.com']
sitemap_urls = ['http://www.newlook.com/uk/sitemap/maps/sitemap_uk_product_en_1.xml']
sitemap_follow = ['/uk/womens/clothing/']
# sitemap_rules = [
# ('/uk/womens/clothing/', 'parse_product'),
# ]
def __init__(self):
self.driver = webdriver.Safari()
self.driver.set_window_size(800,600)
time.sleep(2)
def parse_product(self, response):
driver = self.driver
driver.get(response.url)
time.sleep(1)
# Collect products
itemDetails = driver.find_elements_by_class_name('product-details-page content')
# Pull features
desc = itemDetails[0].find_element_by_class_name('product-description__name').text
href = driver.current_url
# Generate a product identifier
identifier = href.split('/p/')[1].split('?comp')[0]
identifier = int(identifier)
# datetime
dt = date.today()
dt = dt.isoformat()
# Price Symbol removal and integer conversion
try:
priceString = itemDetails[0].find_element_by_class_name('price product-description__price').text
except:
priceString = itemDetails[0].find_element_by_class_name('price--previous-price product-description__price--previous-price ng-scope').text
priceInt = priceString.split('£')[1]
originalPrice = float(priceInt)
# discountedPrice Logic
try:
discountedPriceString = itemDetails[0].find_element_by_class_name('price price--marked-down product-description__price').text
discountedPriceInt = discountedPriceString.split('£')[1]
discountedPrice = float(discountedPriceInt)
except:
discountedPrice = 'N/A'
# NlScrapeItem
item = NlScrapeItem()
# Append product to NlScrapeItem
item['identifier'] = identifier
item['href'] = href
item['description'] = desc
item['originalPrice'] = originalPrice
item['discountedPrice'] = discountedPrice
item['firstSighted'] = dt
item['lastSighted'] = dt
yield item
另外,不要猶豫,要求任何進一步的詳情,請參閱Scrapy包擺脫錯誤(link - github)內的鏈接sitemap並鏈接到實際的文件。我們將衷心感謝您的幫助。
編輯:一個思想 看2nd link(從Scrapy包),我可以看到_cbs在def __init__(self, *a, **kw):
函數初始化 - 的事實是,我有我自己的初始化邏輯把它扔了嗎?
謝謝你 - 工作!太棒了! – Philipp