2016-10-05 60 views
0

運行scrapy項目時,我有這個錯誤 我spider.py代碼類型錯誤:「instancemethod」對象有沒有屬性「__getitem__」運行時scrapy蜘蛛

import scrapy 
import re 
from tutorial.items import TutorialItem 

class tutorialSpider(scrapy.Spider): 
    name="tutorial" 
    allowed_domain=['examble.com'] 
    start_urls = ["examble.com/something"] 

    def parse(self, response): 
     for sel in response.xpath('//*[@id="post-entry"]/div/article'): 
      item = TutorialItem() 
      item['Title'] = sel.xpath('div[2]/h2/a/text()').extract[0] 
      item['MainPageUrl'] = sel.xpath('div[2]/h2/a/@href').extract[0] 
      item['Author'] = sel.xpath('div[2]/div/span/a/text()').extract[0] 
      request = scrapy.Request(item['MianPageUrl'], callback=self.parseContentDetails)    
      request.meta['item'] = item 
      yield request 

    def parseContentDetails(self,response): 
     item = response.meta['item'] 
     item['Content'] = response.xpath() 
     item['Count'] = response.xpath() 
     print type(item) 
     return item 

和我pipeline.py是

class TutorialPipeline(object): 
    def __init__(self): 
     #self.setupDBCon() 
     #self.createTables() 
    def process_item(self, item, spider): 
     for key, value in item.iteritems(): 
      if(isinstance(value, list)): 
       if value: 
        templist = [] 
        for obj in value: 
         temp = self.stripHTML(obj) 
         templist.append(temp) 
        item[key] = templist 
       else: 
        item[key] = "" 
      else: 
       item[key] = self.stripHTML(value) 

     print item.get('Title', '')  
     return item 

和我items.py是

from scrapy.item import Item, Field 

class TutorialItem(Item): 
    Title=Field() 
    Author = Field() 
    MianPageUrl = Field() 
    Content=Field() 
    Count=Field() 

請告訴我此錯誤的解決方案。我搜查了很多網站。該網站只告訴instancemethod對象有在Django沒有屬性錯誤,但我要爲scrapy

+0

請發佈您的回溯,行前(包括'TypeError'行) –

回答

0

你是不是要求提取適當的解決方案,對於每一個提取您必須實際調用該方法,然後建立索引:

item['Title'] = sel.xpath('div[2]/h2/a/text()').extract()[0] 
                 ^^^ 

如果您只想要第一個元素,您可以使用extract_first

item['Title'] = sel.xpath('div[2]/h2/a/text()').extract_first() 
+0

謝謝它爲我工作 – Tharunkumar

相關問題