我無法使用scrapy上的規則獲取數據

我正在做一個蜘蛛與scrapy，如果我不執行任何規則，但現在我試圖實施一個規則來獲取paginator和刮所有其餘的頁面。但我不知道爲什麼我不能實現它。我無法使用scrapy上的規則獲取數據

蜘蛛代碼：

allowed_domains = ['guia.bcn.cat'] 
    start_urls = ['http://guia.bcn.cat/index.php?pg=search&q=*:*'] 

rules = (
     Rule(SgmlLinkExtractor(allow=("index.php?pg=search&from=10&q=*:*&nr=10"), 
     restrict_xpaths=("//div[@class='paginador']",)) 
     , callback="parse_item", follow=True),) 

def parse_item(self, response) 
...

而且，我試圖設置「的index.php」在允許規則的參數，但既不工程。

因爲SgmlLinkExtractor會自動搜索鏈接，所以我沒有在scrapy組中讀過「a /」或「a/@ href」。

控制檯輸出似乎工作正常，但沒有得到任何東西。

有什麼想法？

在此先感謝

編輯：

有了這個代碼工作

from scrapy.selector import Selector 
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor 
from scrapy.contrib.spiders import CrawlSpider, Rule 
from bcncat.items import BcncatItem 
import re 

class BcnSpider(CrawlSpider): 
    name = 'bcn' 
    allowed_domains = ['guia.bcn.cat'] 
    start_urls = ['http://guia.bcn.cat/index.php?pg=search&q=*:*'] 


rules = (
    Rule(
     SgmlLinkExtractor(
      allow=(re.escape("index.php")), 
      restrict_xpaths=("//div[@class='paginador']")), 
     callback="parse_item", 
     follow=True), 
) 

def parse_item(self, response): 
    self.log("parse_item") 
    sel = Selector(response) 
    i = BcncatItem() 
    #i['domain_id'] = sel.xpath('//input[@id="sid"]/@value').extract() 
    #i['name'] = sel.xpath('//div[@id="name"]').extract() 
    #i['description'] = sel.xpath('//div[@id="description"]').extract() 
    return i

來源

2014-01-13 Carlos Espeleta

的allow參數SgmlLinkExtractor是正則表達式（多個）（列表）。所以「？」，「*」和「。」被視爲特殊字符。

可以使用allow=(re.escape("index.php?pg=search&from=10&q=*:*&nr=10"))（與你的腳本的開頭import re某處）

編輯：其實，上面的規則不起作用。但是，因爲您已經有了您想要提取鏈接的受限區域，所以您可以使用allow=('index.php')

來源

2014-01-13 16:16:24

如果我使用'allow =（'index.php'）'它不會執行任何操作 –

我上傳了示例CrawlSpider和控制檯.log：https：//gist.github.com/redapple/8405909 –

現在，它的工作原理！我不知道如何python準確的工作，但如果我取消註釋一個項目行#i ['domain_id'] = sel.xpath（'//輸入[@ id =「sid 「）/ @ value'）。extract（）'有時控制檯顯示**索引錯誤**並修復它，我必須退出標籤空間。這是正常的嗎？是新手錯誤？非常感謝您的答覆和工作！ –

我無法使用scrapy上的規則獲取數據

回答

相關問題