0
我正在使用scrapy從網站的所有頁面獲取一些信息。 這是我的dmoz_spider.py file.when我執行這個我得到IndentationError。 請幫我一把。IndentationError:在def parse_item上的意外縮進(self,response)for scrapy,spider
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.item import Item, Field
import string
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
class EypItem(Item):
title = Field()
link = Field()
price = Field()
review = Field()
class eypSpider(CrawlSpider):
name = "dmoz"
allowed_domains =["http://www.walgreens.com"]
start_urls =["http://www.walgreens.com/search/results.jsp?Ntt=allergy%20medicine"]
rules = (Rule(SgmlLinkExtractor(allow=('/search/results\.jsp',)), callback='parse_item', follow= True),)
def parse_item(self, response):
self.log('Hi, this is an item page! %s' % response.url)
hxs = HtmlXPathSelector(response)
sites = hxs.select('//div[@id="productGrid"]')
items = []
for site in sites:
itemE = EypItem()
itemE["title"] = site.select('//*[@class="image-container"]/a/img/@alt').extract()
itemE["link"] = site.select('//*[@class="image-container"]/a/img/@src').extract()
itemE["price"] = site.select('//*[@class="pricing"]/div/p/text()').extract()
itemE["review"] = site.select('//*[@class="reviewSnippet"]/div/div/span/text()').extract()
items.append(itemE)
return items
貌似格式化來到這裏搞亂了一下計算器,但嘗試,並把'高清parse_item(自我,迴應):'在與之前的行相同的縮進級別上。 – flornquake
假設Ashwini的編輯正確地代表了您的代碼,它看起來像需要縮進的'rules ='行開始縮進(並且'self.log'行縮進了另一個級別)。 – geoffspear
嗨,我嘗試了正確的縮進仍然沒有運氣..我刪除了self.log,它正在運行,但它不抓取任何data.getting this.Crawled 0頁(0頁/分鐘),刮0項(0項目/分鐘) – user2144217