Scrapy獲取回調數據

我試圖抓取廣告的網頁。廣告縮略圖顯示在分頁的第一頁上。點擊每個縮略圖會顯示特定廣告的詳細信息，其中包含廣告的發佈日期。現在我只想抓取最後一天發佈的廣告。Scrapy獲取回調數據

我的Scrapy蜘蛛具有以下結構：

#opens the homepage 
def start_requests(self): 
     url = 'url_to_page' 
     yield scrapy.Request(url=url, callback=self.parse) 

#parse the page for ad links and follow each of them 
def parse(self, response): 
    #get all links from current page; not shown here 
    for link in ad_links: 
     request = scrapy.Request(link, callback=self.parse_single_ad) 

    #follow the next page, only if today's date > posting date <--- 

def parse_single_ad(self, response): 
    #get the posting date; not shown here 
    return item

的問題是，我只能訪問到parse_single_ad()過帳日期，但我具有基於廣告的發佈日期停止分頁中parse() 。有沒有辦法從parse()訪問parse_single_ad()中檢索到的物品？更一般地說，我可以從其父函數訪問回調的數據嗎？

來源

2017-02-23 Botond

只要您想手動關閉Spider，就可以使用CloseSpider。

如果需要，您可以在您的Spider課程中或甚至在Pipeline中執行此操作。

from scrapy import scrapy.exceptions.CloseSpider 

def parse(self, response): 
    if some thing: # write your condition here 
      raise CloseSpider('All ads scraped, now closing spider.') 
    else: 
      # Scrape next page

編輯：

OP說，直到廣告的詳細信息頁面刮他沒有獲得廣告的發佈日期。

但看這個，你有廣告的日期張貼在列表頁面。

來源

2017-02-23 17:05:46 Umair

這隻會關閉遞歸的當前分支嗎？如果我有許多網站，首先要做什麼？ – Botond

它會關閉Spider的執行...如果你有很多'start_urls'，那麼你將會有問題......因爲'CloseSpider'只是QUITS蜘蛛。 – Umair

什麼是你在刮的網站？我有一個想法，如果你在'start_urls'中的每個URL都有一些特定的字符串，那麼我們可以忽略'process_request'方法中的特定URL。請分享你正在抓取的網站，我會幫你。 – Umair

Scrapy獲取回調數據

回答

相關問題