Scrapy - 調用蜘蛛從其他腳本

-2

我創造了這個類parse()方法：Scrapy - 調用蜘蛛從其他腳本

class PitchforkSpider(scrapy.Spider): 
    name = "pitchfork_reissues" 
    allowed_domains = ["pitchfork.com"] 
    #creates objects for each URL listed here 
    start_urls = [ 
        "http://pitchfork.com/reviews/best/reissues/?page=1", 
        "http://pitchfork.com/reviews/best/reissues/?page=2", 
        "http://pitchfork.com/reviews/best/reissues/?page=3", 
    ] 

    def parse(self, response): 

     items = [] 

     for sel in response.xpath('//div[@class="album-artist"]'): 
      item = PitchforkItem() 
      item['artist'] = sel.xpath('//ul[@class="artist-list"]/li/text()').extract() 
      item['reissue'] = sel.xpath('//h2[@class="title"]/text()').extract() 
      items.append(item) 

     return items

從另一個腳本，我導入module上述class屬於其中：

from blogs.spiders.pitchfork_reissues_feed import *

並且實例化class，我嘗試撥打parse()方法：

def reissues(): 

    pitchfork_reissues = PitchforkSpider() 
    albums = pitchfork_reissues.parse(response) 
    print (albums)

，但我得到了以下錯誤：

reissues = pitchfork_reissues.parse(response) 
NameError: global name 'response' is not defined

Aparently，該parse()方法需要scrapy.http.Response一個實例。 如何在reissues()的第二個腳本的上下文中創建這樣的實例？

來源

2016-09-24 data_garden

你如何使用您的第一個腳本中的「PitchforkSpider」類？ – njzk2

@ njzk2你是什麼意思？你能否更具體一些？ –

你說'從另一個腳本'。我假設你有另一個腳本，成功地使用這個類？ – njzk2

from scrapy.http import Response 

response = Response(body=u'html here')

現在，我不認爲你將能夠抓取這種方式，因爲它不是Scrapy是如何工作的，但你仍然可以創建響應對象

來源

2016-09-24 23:02:30 eLRuLL

請注意我的編輯。你能否在上面的代碼的上下文中添加你的答案，否則它是沒用的。我只想返回爬到我的腳本中的項目。 –

Scrapy - 調用蜘蛛從其他腳本

回答

相關問題