2015-07-10 38 views
7

我有一個非常簡單的代碼,如下所示。刮可以,我可以看到所有print報表生成正確的數據。在Pipeline中,初始化工作正常。但是,process_item函數沒有被調用,因爲print語句在函數的開頭永遠不會執行。Python,Scrapy,Pipeline:函數「process_item」沒有被調用

蜘蛛:comosham.py

import scrapy 
from scrapy.spider import Spider 
from scrapy.selector import Selector 
from scrapy.http import Request 
from activityadvisor.items import ComoShamLocation 
from activityadvisor.items import ComoShamActivity 
from activityadvisor.items import ComoShamRates 
import re 


class ComoSham(Spider): 
    name = "comosham" 
    allowed_domains = ["www.comoshambhala.com"] 
    start_urls = [ 
     "http://www.comoshambhala.com/singapore/classes/schedules", 
     "http://www.comoshambhala.com/singapore/about/location-contact", 
     "http://www.comoshambhala.com/singapore/rates-and-offers/rates-classes", 
     "http://www.comoshambhala.com/singapore/rates-and-offers/rates-classes/rates-private-classes" 
    ] 

    def parse(self, response): 
     category = (response.url)[39:44] 
     print 'in parse' 
     if category == 'class': 
      pass 
      """self.gen_req_class(response)""" 
     elif category == 'about': 
      print 'about to call parse_location' 
      self.parse_location(response) 
     elif category == 'rates': 
      pass 
      """self.parse_rates(response)""" 
     else: 
      print 'Cant find appropriate category! check check check!! Am raising Level 5 ALARM - You are a MORON :D' 


    def parse_location(self, response): 
     print 'in parse_location'  
     item = ComoShamLocation() 
     item['category'] = 'location' 
     loc = Selector(response).xpath('((//div[@id = "node-2266"]/div/div/div)[1]/div/div/p//text())').extract() 
     item['address'] = loc[2]+loc[3]+loc[4]+(loc[5])[1:11] 
     item['pin'] = (loc[5])[11:18] 
     item['phone'] = (loc[9])[6:20] 
     item['fax'] = (loc[10])[6:20] 
     item['email'] = loc[12] 
     print item['address'],item['pin'],item['phone'],item['fax'],item['email'] 
     return item 

項目文件:

import scrapy 
from scrapy.item import Item, Field 

class ComoShamLocation(Item): 
    address = Field() 
    pin = Field() 
    phone = Field() 
    fax = Field() 
    email = Field() 
    category = Field() 

管線檔案:

class ComoShamPipeline(object): 
    def __init__(self): 
     self.locationdump = csv.writer(open('./scraped data/ComoSham/ComoshamLocation.csv','wb')) 
     self.locationdump.writerow(['Address','Pin','Phone','Fax','Email']) 


    def process_item(self,item,spider): 
     print 'processing item now' 
     if item['category'] == 'location': 
      print item['address'],item['pin'],item['phone'],item['fax'],item['email'] 
      self.locationdump.writerow([item['address'],item['pin'],item['phone'],item['fax'],item['email']]) 
     else: 
      pass 
+0

是否在'parse_location'函數末尾生成了一個項目並且具有它的值? – GHajba

+0

是的,在'parse_location'的末尾,我正在打印它並且輸出如預期。 –

+0

我想你有,但我必須問它:你在'settings.py'中配置了ItemPipeline嗎? – GHajba

回答

9

您的問題是,你從來沒有真正屈服的項目。 parse_location返回一個要解析的項目,但解析永遠不會產生該項目。

解決辦法是更換:

self.parse_location(response) 

yield self.parse_location(response) 

更具體地說,如果沒有項目取得process_item不會被調用。

1

使用ITEM_PIPELINES在settings.py:

ITEM_PIPELINES = ['project_name.pipelines.pipeline_class'] 
0

添加到上述問題的答案,
1.記住添加以下行settings.py中! ITEM_PIPELINES = {'[YOUR_PROJECT_NAME].pipelines.[YOUR_PIPELINE_CLASS]': 300} 2.當你的蜘蛛運行時產生物品! ​​

+0

將['YOUR_PROJECT_NAME]更正爲「[YOUR_PROJECT_NAME]」 –

0

這解決了我的問題: 我刪除所有的項目我的管道被調用之前,所以process_item()沒有得到調用,但open_spider和close_spider是被調用。 因此,tmy解決方案只是改變命令,以便在丟棄項目的其他管道之前使用此管道。

Scrapy Pipeline Documentation.

只要記住,Scrapy調用Pipeline.process_item()僅當有要處理的項目!