Python，Scrapy，Pipeline：函數「process_item」沒有被調用

我有一個非常簡單的代碼，如下所示。刮可以，我可以看到所有print報表生成正確的數據。在Pipeline中，初始化工作正常。但是，process_item函數沒有被調用，因爲print語句在函數的開頭永遠不會執行。Python，Scrapy，Pipeline：函數「process_item」沒有被調用

蜘蛛：comosham.py

import scrapy 
from scrapy.spider import Spider 
from scrapy.selector import Selector 
from scrapy.http import Request 
from activityadvisor.items import ComoShamLocation 
from activityadvisor.items import ComoShamActivity 
from activityadvisor.items import ComoShamRates 
import re 


class ComoSham(Spider): 
    name = "comosham" 
    allowed_domains = ["www.comoshambhala.com"] 
    start_urls = [ 
     "http://www.comoshambhala.com/singapore/classes/schedules", 
     "http://www.comoshambhala.com/singapore/about/location-contact", 
     "http://www.comoshambhala.com/singapore/rates-and-offers/rates-classes", 
     "http://www.comoshambhala.com/singapore/rates-and-offers/rates-classes/rates-private-classes" 
    ] 

    def parse(self, response): 
     category = (response.url)[39:44] 
     print 'in parse' 
     if category == 'class': 
      pass 
      """self.gen_req_class(response)""" 
     elif category == 'about': 
      print 'about to call parse_location' 
      self.parse_location(response) 
     elif category == 'rates': 
      pass 
      """self.parse_rates(response)""" 
     else: 
      print 'Cant find appropriate category! check check check!! Am raising Level 5 ALARM - You are a MORON :D' 


    def parse_location(self, response): 
     print 'in parse_location'  
     item = ComoShamLocation() 
     item['category'] = 'location' 
     loc = Selector(response).xpath('((//div[@id = "node-2266"]/div/div/div)[1]/div/div/p//text())').extract() 
     item['address'] = loc[2]+loc[3]+loc[4]+(loc[5])[1:11] 
     item['pin'] = (loc[5])[11:18] 
     item['phone'] = (loc[9])[6:20] 
     item['fax'] = (loc[10])[6:20] 
     item['email'] = loc[12] 
     print item['address'],item['pin'],item['phone'],item['fax'],item['email'] 
     return item

項目文件：

import scrapy 
from scrapy.item import Item, Field 

class ComoShamLocation(Item): 
    address = Field() 
    pin = Field() 
    phone = Field() 
    fax = Field() 
    email = Field() 
    category = Field()

管線檔案：

class ComoShamPipeline(object): 
    def __init__(self): 
     self.locationdump = csv.writer(open('./scraped data/ComoSham/ComoshamLocation.csv','wb')) 
     self.locationdump.writerow(['Address','Pin','Phone','Fax','Email']) 


    def process_item(self,item,spider): 
     print 'processing item now' 
     if item['category'] == 'location': 
      print item['address'],item['pin'],item['phone'],item['fax'],item['email'] 
      self.locationdump.writerow([item['address'],item['pin'],item['phone'],item['fax'],item['email']]) 
     else: 
      pass

來源

2015-07-10 Tuhina Singh

是否在'parse_location'函數末尾生成了一個項目並且具有它的值？ – GHajba

是的，在'parse_location'的末尾，我正在打印它並且輸出如預期。 –

我想你有，但我必須問它：你在'settings.py'中配置了ItemPipeline嗎？ – GHajba

您的問題是，你從來沒有真正屈服的項目。 parse_location返回一個要解析的項目，但解析永遠不會產生該項目。

解決辦法是更換：

self.parse_location(response)

與

yield self.parse_location(response)

更具體地說，如果沒有項目取得process_item不會被調用。

來源

2015-07-13 15:34:48 rocktheartsm4l

使用ITEM_PIPELINES在settings.py：

ITEM_PIPELINES = ['project_name.pipelines.pipeline_class']

來源

2015-12-23 06:13:41 Ganesh

添加到上述問題的答案，
1.記住添加以下行settings.py中！ ITEM_PIPELINES = {'[YOUR_PROJECT_NAME].pipelines.[YOUR_PIPELINE_CLASS]': 300} 2.當你的蜘蛛運行時產生物品！

來源

2017-11-04 10:04:01 atb00ker

將['YOUR_PROJECT_NAME]更正爲「[YOUR_PROJECT_NAME]」 –

這解決了我的問題：我刪除所有的項目我的管道被調用之前，所以process_item（）沒有得到調用，但open_spider和close_spider是被調用。因此，tmy解決方案只是改變命令，以便在丟棄項目的其他管道之前使用此管道。

Scrapy Pipeline Documentation.

只要記住，Scrapy調用Pipeline.process_item（）僅當有要處理的項目！

來源

2017-11-07 10:22:48

Python，Scrapy，Pipeline：函數「process_item」沒有被調用

回答

相關問題