2013-08-26 32 views
0

我正在使用scrapy從網站的所有頁面獲取一些信息。 這是我的dmoz_spider.py file.when我執行這個我得到IndentationError。 請幫我一把。IndentationError:在def parse_item上的意外縮進(self,response)for scrapy,spider

from scrapy.spider import BaseSpider 
from scrapy.selector import HtmlXPathSelector 
from scrapy.item import Item, Field 
import string 
from scrapy.contrib.spiders import CrawlSpider, Rule 
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor 
class EypItem(Item): 
    title = Field() 
    link = Field() 
    price = Field() 
    review = Field() 
class eypSpider(CrawlSpider): 
    name = "dmoz" 
    allowed_domains =["http://www.walgreens.com"] 
    start_urls =["http://www.walgreens.com/search/results.jsp?Ntt=allergy%20medicine"] 
rules = (Rule(SgmlLinkExtractor(allow=('/search/results\.jsp',)), callback='parse_item', follow= True),) 
    def parse_item(self, response): 
    self.log('Hi, this is an item page! %s' % response.url) 
     hxs = HtmlXPathSelector(response) 
     sites = hxs.select('//div[@id="productGrid"]') 
     items = [] 
     for site in sites: 
      itemE = EypItem() 
      itemE["title"] = site.select('//*[@class="image-container"]/a/img/@alt').extract() 
      itemE["link"] = site.select('//*[@class="image-container"]/a/img/@src').extract() 
      itemE["price"] = site.select('//*[@class="pricing"]/div/p/text()').extract() 
      itemE["review"] = site.select('//*[@class="reviewSnippet"]/div/div/span/text()').extract() 
      items.append(itemE) 
     return items 
+0

貌似格式化來到這裏搞亂了一下計算器,但嘗試,並把'高清parse_item(自我,迴應):'在與之前的行相同的縮進級別上。 – flornquake

+1

假設Ashwini的編輯正確地代表了您的代碼,它看起來像需要縮進的'rules ='行開始縮進(並且'self.log'行縮進了另一個級別)。 – geoffspear

+0

嗨,我嘗試了正確的縮進仍然沒有運氣..我刪除了self.log,它正在運行,但它不抓取任何data.getting this.Crawled 0頁(0頁/分鐘),刮0項(0項目/分鐘) – user2144217

回答

0

縮進下面

def parse_item(self, response): 
self.log('Hi, this is an item page! %s' % response.url) 

行本

def parse_item(self, response): 
    self.log('Hi, this is an item page! %s' % response.url) 
    hxs = HtmlXPathSelector(response) 
    sites = hxs.select('//div[@id="productGrid"]') 
    items = [] 
    for site in sites: 
     itemE = EypItem() 
     itemE["title"] = site.select('//*[@class="image-container"]/a/img/@alt').extract() 
     itemE["link"] = site.select('//*[@class="image-container"]/a/img/@src').extract() 
     itemE["price"] = site.select('//*[@class="pricing"]/div/p/text()').extract() 
     itemE["review"] = site.select('//*[@class="reviewSnippet"]/div/div/span/text()').extract() 
     items.append(itemE) 
    return items 
0
從壓痕錯誤

除此之外,您allowed_domains已被錯誤地規定。更改如下(這是說,去掉「HTTP://」從URL前綴):

allowed_domains =["www.walgreens.com"]