Scrapy空輸出，但數據被刮

我正在抓取一個網站，並試圖將輸出保存在MongoDB中。它注意到代碼是好的，但是當我嘗試了一個簡單的輸出（scrapy抓取IR -o items.json -t json）時，該文件變爲空白......但是蜘蛛的日誌顯示數據被刮掉了......Scrapy空輸出，但數據被刮

這裏是我的蜘蛛代碼

from scrapy.spider import BaseSpider 
from scrapy.selector import HtmlXPathSelector 
from teste.items import IngressoRapidoItem 

class IngressoRapidoSpider(BaseSpider): 
    name = "IR" 
    allowed_domains = ["ingressorapido.com.br"] 
    start_urls = (
     'http://www.ingressorapido.com.br/eventos.aspx?genero=55', 
     ) 

    def parse(self, response): 
     hxs = HtmlXPathSelector(response) 
     items = [] 
     item = IngressoRapidoItem() 
     item['banda'] = hxs.select('normalize-space(//a[contains(@href,"Evento")] /text())').extract() 
     item['local'] = hxs.select('normalize-space(//td/span[contains(@style,  "normal")]/text())').extract() 
     items.append(item) 
     return items

任何人都猜測爲什麼輸出爲null，即使數據被廢棄？在此先感謝

來源

2013-08-29 Eduardo Almeida

日誌看起來像什麼？你可以上傳內容嗎？ – enginefree

如果您運行scrapy runspider .py -o out.json'，會發生什麼情況？ – alecxe

alecxe，用你告訴我的命令輸出完美！您能否給我進一步解釋，以及scrapy爬行/管道爲什麼不起作用？ –

運行上面貼出的代碼後，我可以確認數據已被刪除，但數據是否實際有用很難說，因爲只有一個項目是使用場地創建的，但沒有事件名稱。

我修改了xpath代碼，並且能夠返回http://www.ingressorapido.com.br/eventos.aspx?genero=55第一頁上顯示的所有10個事件的項目。然後，我可以毫無困難地將抓取的數據寫入json文件。

讓我知道你是否有任何問題，或者如果xpath代碼沒有返回所需的數據。

from scrapy.spider import BaseSpider 
from scrapy.selector import HtmlXPathSelector 
from teste.items import IngressoRapidoItem 

class IngressoRapidoSpider(BaseSpider): 
    name = "IR" 
    allowed_domains = ["ingressorapido.com.br"] 
    start_urls = (
     'http://www.ingressorapido.com.br/eventos.aspx?genero=55', 
     ) 

    def parse(self, response): 
     hxs = HtmlXPathSelector(response) 
     events = hxs.select('//table[@id="ContentPlaceHolder1_dlEventos"]//table//td[2]') 
     items = [] 
     for e in events: 
      item = IngressoRapidoItem() 
      item['banda'] = e.select('normalize-space(.//a//text())').extract() 
      item['local'] = e.select('normalize-space(.//span//text())').extract() 
      items.append(item) 
     return items

來源

2013-08-30 08:39:27 Talvalin

代碼確實有效，但仍然沒有輸出json，xml或其他東西......我認爲這可能是寫入權限，但文件沒有數據... –

嘗試刪除settings.py文件中的所有管道引用並再次運行該命令。 – Talvalin

仍空...即使管道關閉 –

Scrapy空輸出，但數據被刮

回答

相關問題